Open dannyconrad opened 2 years ago
Hi, thanks for your feedback. Could you share the version of cellSNP and cellsnp-lite?
The reason why the read counts are different: cellSNP (actually the dependent pysam) has a default limitation that the max_depth (i.e., max pileup-ed read count) is 8000, so you may check that every DP<=8000 in the cellSNP output. But cellsnp-lite does not have this max_depth limitation, it will pileup as many reads as possible. (we are trying to add a --maxPileup
option with similar function as max_depth, for the next release of cellsnp-lite)
The reason why cellsnp-lite+vireo does not work well: The large read counts in cellsnp-lite output makes it more likely for vireo to reach local optima so that the parameters of two donors become the same and hence vireo cannot assign the cells to certain donor.
Besides, vireo is designed for nuclear SNVs. For mito SNVs, you may want to try this tutorial, which was used by MQuad. Note that the duplicate reads should probably be removed beforehand, as there are not UMIs in your case.
Xianjie
I have a scATACseq dataset where I'm trying to use the mitochondrial reads to demultiplex two mouse strains. When I run cellSNP and cellsnp-lite in mode 2 using what I believe to be matching parameters and the same input files, both identify all of the same SNPs, but the read depths reported in cellSNP.base.vcf.gz and the sparse matrices are different; the depths reported by cellsnp-lite are ~100X greater. When I run vireo on these two different outputs, the cellSNP results seem to properly demultiplex my donors, whereas the cellsnp-lite results leave all cells unassigned. Importantly, I noticed in the donor_ids.tsv file for the cellsnp-lite vireo run, that all cells are reported to have the same results:
I guess I'm not sure if this is a bug with cellsnp-lite or if I did something drastically different running the two programs without realizing it. Any ideas?
Here are the two batches of code I ran and the relevant console outputs:
cellsnp-lite
cellSNP