scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 597 forks source link

after running sc.pp.highly_variable_genes, the sc.pp.scale get error??? #738

Closed xuebaliang closed 1 year ago

xuebaliang commented 5 years ago

hello, recently I use the scanpy package to preprocess the single cell rna-seq data, the following is my process step. But when I go to the last step, namely I want to scale the dataset, the error occurs. I have 4271 cells and 1024 genes after running "adata = adata[:, adata.var["highly_variable"]]". But the error says the 4271 is not equal to 1024 in the dimension 0. I do not know the reason, so can you give me an answer?Thanks very much. sc.pp.filter_genes(adata, min_counts = filter_min_counts) sc.pp.filter_cells(adata, min_counts = filter_min_counts) sc.pp.normalize_per_cell(adata) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5, subset=True) adata = adata[:, adata.var["highly_variable"]] sc.pp.scale(adata)

LuckyMD commented 5 years ago

Hey! Using the latest verisons of scanpy and anndata, I have tried reproducing this via:

adata = sc.datasets.pbmc3k()
sc.pp.filter_genes(adata, min_counts = 10)
sc.pp.filter_cells(adata, min_counts = 10)
sc.pp.normalize_per_cell(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5, subset=True)
adata = adata[:, adata.var["highly_variable"]]
sc.pp.scale(adata)

and I don't get an error. Could you reproduce this error with one of the datasets in sc.datasets? That way I could try to reproduce your error. Also, which version of anndata and scanpy are you on?

Other than that, you don't need the line adata = adata[:, adata.var["highly_variable"]] if you use subset=True in the sc.pp.highly_variable_genes() call.