Closed mjang2000 closed 3 years ago
I'm commenting here because I think I'm getting the same error.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-95-c05ffdb49fed> in <module>
----> 1 tissue_with_markers.setup_reconstruction(markers_to_use=markers_in_sc)
/Volumes/Mac-External/10x/bin/novosparc/novosparc/common/_tissue.py in setup_reconstruction(self, markers_to_use, num_neighbors_s, num_neighbors_t)
50 else:
51 cost_marker_genes = cdist(self.dge[:, markers_to_use] / np.amax(self.dge[:, markers_to_use]),
---> 52 self.atlas_matrix / np.amax(self.atlas_matrix))
53 dge = self.dge[:, np.setdiff1d(np.arange(self.dge.shape[1]), markers_to_use)]
54 self.num_markers = len(markers_to_use)
~/opt/anaconda3/envs/scvi-env/lib/python3.7/site-packages/scipy/spatial/distance.py in cdist(XA, XB, metric, *args, **kwargs)
2710 raise ValueError('XB must be a 2-dimensional array.')
2711 if s[1] != sB[1]:
-> 2712 raise ValueError('XA and XB must have the same number of columns '
2713 '(i.e. feature dimension.)')
2714
ValueError: XA and XB must have the same number of columns (i.e. feature dimension.)
Okay maybe not, sorry I found my issue of course so quickly after I posted. It's because the data as I've filtered it do not contain all of the provided reference genes.
I fixed it by taking code from one of the older tutorials:
with open('../bdtnp/dge.txt') as file:
header = file.readline()
dge_cols = header.split()
gene_cols = []
for i, dge_col in enumerate(dge_cols):
if dge_col in gene_set:
gene_cols.append(i)
And then setting the atlas_matrix to this:
atlas_matrix = np.loadtxt(bdtnp_path, usecols=gene_cols, skiprows=1)
The code worked for the template example that was provided but it didn't work for my lab data. The file loaded in for the atlas_matrix was hardcoded to be 3039 rows by 84 marker genes. I was wondering why the atlas matrix had to be 3039 rows, which corresponds to the size of the location matrix. I may be interpreting the code incorrectly, but shouldn't the atlas matrix be (# of cells x # of marker genes) and not (# of locations x # of marker genes)?
From what I understand, you're right that the atlas matrix is hardcoded from the bdtnp/dge.txt file. If you're using their provided 3039 cell embryo shape with this:
target_space_path = '..bdtnp/geometry.txt'
locations = novosparc.io.load_target_space(target_space_path, is_2D=True)
locations = locations[np.random.choice(locations.shape[0], 3039), :]
Then the atlas matrix has to match the locations there in order for the program to map your cells according to the reference marker genes.
Just to check if you had the same issue I had, what happens if you enter:
len(markers_in_sc)
So, as the Altas Matrix represents a spatial orientation for your (physical) target space, that means your locations, it should have the same dimensions as your locations file. The number of cells in your expression space is irrelevant for your reference Atlas, isnt it?
Hi all,
indeed the dimensions of the atlas matrix must be (# locations , # marker genes). The atlas matrix contains the spatial expression of the marker genes in the target space that you're mapping the scRNA-seq data onto, and therefore has to have the same spatial dimension. It's your guide for the spatial reconstruction. The # of cells in the scRNA-seq data is irrelevant, novoSpaRc can map many cells to fewer locations of the target space and vice versa.
Does this clarify your questions?
Thanks for the clarification. In the reconstruct_bdtnp_with_markers.py
example, the atlas matrix is attained by atlas_matrix = dataset.X[:, markers_to_use]
. If my dataset has more cells than locations, would atlas_matrix = dataset.X[np.random.choice(dataset.shape[0], locations.shape[0], replace=False)][:, markers_in_sc]
be the right approach?
In the BDTNP example, the reference atlas coincides with the "scRNAseq" data.
If you have another dataset, you have to construct the reference atlas first. This will define your target space then. And you can map your scRNAseq data regardless of their number.
Hello! I'm working on reconstructing the tissue with marker genes, and from my knowledge, the atlas matrix should be the expression matrix of cells x marker genes. It seems that the number of rows of the atlax matrix has to equal to the number of locations in the location matrix or else I get a dimensionality error. Why is the atlax matrix size dependent on location size? Shouldn't it just be the total number of cells x marker genes?