scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 595 forks source link

scRNA-seq CRISPR data: sc.read_10x_h5 does not include guide sequence in the count matrix and .var #2398

Open kchl5 opened 1 year ago

kchl5 commented 1 year ago

The regular read_10x_h5() function neglects the gRNA in scRNA-seq CRISPR data which were provided to cell ranger (does not show In adata.X and adata.var). However, with the read_mtx() it is included in the adata.X and the gene_ids can be manually added read in from the features.tsv.gz file. @vitkl.

Example

adata = sc.read_mtx('matrix.mtx.gz')
print(adata.X.T.shape)
adata_raw=sc.read_10x_h5('filtered_feature_bc_matrix.h5')
print(adata_raw.X.shape)
(12797, 36685)

(12797, 36601)

Versions

adamgayoso commented 1 year ago

you might consider using the 10x readers in muon, which should store each of these in a separate modality

vitkl commented 1 year ago

How does scanpy/muon know that these are separate modalities? Isn't it a bug to discard certain rows from a file?

adamgayoso commented 1 year ago

I'm not sure any rows or columns would be discarded, I think muon just looks at the feature type and separates accordingly, right @gtca ?

gtca commented 1 year ago

Hey @kchl5 and @vitkl,

Muon (mu.read_10x_h5()) should load it correctly if the feature_types value for the gRNAs is different from the one for the genes. As they are missing, I assume it is.

Moreover, just in case you're interested, splitting by feature_types is even a feature of the MuData initialiser, so running

adata = sc.read_10x_h5(h5file, gex_only=False)
mdata = MuData(adata)

should also work, and this is roughly what muon does.

(Thanks for tagging me, @adamgayoso!)

gtca commented 1 year ago

For your second question, @vitkl, I guess you could suggest a new issue to improve the current documentation so that this is specified in the description of sc.read_10x_h5. It is documented at the level of function parameters at the moment.