Open mona-lit opened 5 years ago
Any updates on this one @flying-sheep? I keep having the same issue.
n_top_genes is used here:
I would assume that this only happens if there’s several genes with the exact same dispersion, is that possible?
We need a reproducible example, else we can’t help you further: Please give me some lines of code that I can paste into a notebook unchanged that will demonstrate the problem.
The number of HVGs not being exactly 1000 or 2000 is quite normal as dispersions can be exactly the same. 1488 is surprisingly high though. Maybe your dataset is very sparse so that you have a lot of dispersion ties for low count genes.
I'm not sure what your issue with scaling is about though. Have you filtered out genes that are 0 using sc.pp.filter_genes()
? This could be causing problems.
Hi,
Im using scanpy 1.4.2 to analyze my data, using the following command:
sc.pp.highly_variable_genes(heart_cmc, flavor = 'cell_ranger', n_top_genes = 1000)
However, instead of getting 1000 HVG, it reports 1488 HVG. Similar thing happens with higher numbers of HVG (e.g.
n_top_genes = 2000
returns 1999).The scaling then fails with a following error: ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.
Any suggestions on how to fix it? When I dont specify n_top_genes, the thing runs without problems. Thanks!