rajewsky-lab / novosparc

BSD 3-Clause "New" or "Revised" License
125 stars 41 forks source link

Atlas Matrix Bug #43

Closed mjang2000 closed 3 years ago

mjang2000 commented 3 years ago

Hello! I'm working on reconstructing the tissue with marker genes, and from my knowledge, the atlas matrix should be the expression matrix of cells x marker genes. It seems that the number of rows of the atlax matrix has to equal to the number of locations in the location matrix or else I get a dimensionality error. Why is the atlax matrix size dependent on location size? Shouldn't it just be the total number of cells x marker genes?

aralbright commented 3 years ago

I'm commenting here because I think I'm getting the same error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-95-c05ffdb49fed> in <module>
----> 1 tissue_with_markers.setup_reconstruction(markers_to_use=markers_in_sc)

/Volumes/Mac-External/10x/bin/novosparc/novosparc/common/_tissue.py in setup_reconstruction(self, markers_to_use, num_neighbors_s, num_neighbors_t)
     50                 else:
     51             cost_marker_genes = cdist(self.dge[:, markers_to_use] / np.amax(self.dge[:, markers_to_use]),
---> 52                                       self.atlas_matrix / np.amax(self.atlas_matrix))
     53                         dge = self.dge[:, np.setdiff1d(np.arange(self.dge.shape[1]), markers_to_use)]
     54                         self.num_markers = len(markers_to_use)

~/opt/anaconda3/envs/scvi-env/lib/python3.7/site-packages/scipy/spatial/distance.py in cdist(XA, XB, metric, *args, **kwargs)
   2710         raise ValueError('XB must be a 2-dimensional array.')
   2711     if s[1] != sB[1]:
-> 2712         raise ValueError('XA and XB must have the same number of columns '
   2713                          '(i.e. feature dimension.)')
   2714 

ValueError: XA and XB must have the same number of columns (i.e. feature dimension.)
aralbright commented 3 years ago

Okay maybe not, sorry I found my issue of course so quickly after I posted. It's because the data as I've filtered it do not contain all of the provided reference genes.

I fixed it by taking code from one of the older tutorials:


with open('../bdtnp/dge.txt') as file:
    header = file.readline()

dge_cols = header.split()

gene_cols = []
for i, dge_col in enumerate(dge_cols):
    if dge_col in gene_set:
        gene_cols.append(i)

And then setting the atlas_matrix to this:

atlas_matrix = np.loadtxt(bdtnp_path, usecols=gene_cols, skiprows=1)
mjang2000 commented 3 years ago

The code worked for the template example that was provided but it didn't work for my lab data. The file loaded in for the atlas_matrix was hardcoded to be 3039 rows by 84 marker genes. I was wondering why the atlas matrix had to be 3039 rows, which corresponds to the size of the location matrix. I may be interpreting the code incorrectly, but shouldn't the atlas matrix be (# of cells x # of marker genes) and not (# of locations x # of marker genes)?

aralbright commented 3 years ago

From what I understand, you're right that the atlas matrix is hardcoded from the bdtnp/dge.txt file. If you're using their provided 3039 cell embryo shape with this:

target_space_path = '..bdtnp/geometry.txt'
locations = novosparc.io.load_target_space(target_space_path, is_2D=True)
locations = locations[np.random.choice(locations.shape[0], 3039), :]

Then the atlas matrix has to match the locations there in order for the program to map your cells according to the reference marker genes.

Just to check if you had the same issue I had, what happens if you enter:

len(markers_in_sc)
MalteMederacke commented 3 years ago

So, as the Altas Matrix represents a spatial orientation for your (physical) target space, that means your locations, it should have the same dimensions as your locations file. The number of cells in your expression space is irrelevant for your reference Atlas, isnt it?

nukappa commented 3 years ago

Hi all,

indeed the dimensions of the atlas matrix must be (# locations , # marker genes). The atlas matrix contains the spatial expression of the marker genes in the target space that you're mapping the scRNA-seq data onto, and therefore has to have the same spatial dimension. It's your guide for the spatial reconstruction. The # of cells in the scRNA-seq data is irrelevant, novoSpaRc can map many cells to fewer locations of the target space and vice versa.

Does this clarify your questions?

mjang2000 commented 3 years ago

Thanks for the clarification. In the reconstruct_bdtnp_with_markers.py example, the atlas matrix is attained by atlas_matrix = dataset.X[:, markers_to_use]. If my dataset has more cells than locations, would atlas_matrix = dataset.X[np.random.choice(dataset.shape[0], locations.shape[0], replace=False)][:, markers_in_sc] be the right approach?

nukappa commented 3 years ago

In the BDTNP example, the reference atlas coincides with the "scRNAseq" data.

If you have another dataset, you have to construct the reference atlas first. This will define your target space then. And you can map your scRNAseq data regardless of their number.