scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 595 forks source link

subsetting / subclustering, use raw #826

Open bobermayer opened 5 years ago

bobermayer commented 5 years ago

when I select a subset of cells using ad_sub=ad[ad.obs['louvain']=='subcluster_of_interest',:], and then re-apply preprocessing routines, this will use only the genes of ad.X (variable over the entire dataset), but not those that are variable only within the subcluster and might be informative for its substructure even if the variance doesn't pass the cutoff when evaluated over the entire dataset. basically, the set of variable genes can only shrink by subsetting..

I'd propose to either use

tmp=ad[ad.obs['louvain']=='subcluster_of_interest',:]
ad_sub=sc.AnnData(tmp.raw.X,obs=tmp.obs,var=tmp.raw.var)

to "reset" the .X matrix (maybe there's a better way?) or to make sc.pp.highly_variable_genes work on ad.raw.X

scanpy==1.4.4 anndata==0.6.22.post1 umap==0.3.10 numpy==1.16.4 scipy==1.2.1 pandas==0.25.1 scikit-learn==0.20.3 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1
chansigit commented 4 years ago

I have the same question

ajynair commented 3 years ago

+1

li-xuyang28 commented 3 years ago

+1