scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.92k stars 603 forks source link

cell_ranger HVG flavor inconsistent with 10x code #969

Open adamgayoso opened 4 years ago

adamgayoso commented 4 years ago

It appears that in the cell ranger code, the dispersion is calculated using the negative binomial relationship between mean and dispersion, see

https://github.com/10XGenomics/cellranger/blob/5f5a6293bbc067e1965e50f0277286914b96c908/lib/python/cellranger/analysis/stats.py#L44

Furthermore, these summary statistics are calculated on the count matrix normalized by library size, but not log-transformed.

https://github.com/10XGenomics/cellranger/blob/5f5a6293bbc067e1965e50f0277286914b96c908/lib/python/cellranger/analysis/pca.py#L91-L95.

As a follow-up, the "Seurat" flavor seems to be no longer used in Seurat. Any plans to implement their "vst" method?

adamgayoso commented 4 years ago

Just a follow up here. I found the code from the Zheng et al. paper:

It appears they do calculate dispersion as var/mean but on the library size normalized counts (not log)

https://github.com/10XGenomics/single-cell-3prime-paper/blob/265433ebf858c7fdcab759ca9f36b5e0241ceece/pbmc68k_analysis/util.R#L122-L135