scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.93k stars 603 forks source link

n_bins not respected in highly_variable_genes(..., flavour='cell_ranger') #888

Open flying-sheep opened 5 years ago

flying-sheep commented 5 years ago

This code creates the range [10, 15, …, 100], which is 19 values. Taking those percentiles together with -inf and +inf creates 20 bins (from 21 bin borders), understood.

https://github.com/theislab/scanpy/blob/e6e08e51d63c78581bb9c86fe6e302b80baef623/scanpy/preprocessing/_highly_variable_genes.py#L90

But 1. why start at 10% and 2. why is n_bins ignored and there are always 20 bins created?

@Koncopd @falexwolf you introduced this in #330, is this what cellranger does?

davidhbrann commented 5 years ago

We also tried to address this in #572 and #624 but neither of those PRs have been merged since they're not fully backwards compatible.

flying-sheep commented 5 years ago

Well, we really need to get that release out of the door.

But on the other hand, as @gokceneraslan said the current behavior is a bug, so we can change it now!

Gökçen, do you still plan on consolidating those PRs?

flying-sheep commented 10 months ago

5 years later, this is still not fixed … where is the decision recorded that they shouldn’t be merged?