single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
131 stars 11 forks source link

16,299 variants identified from mouse mtDNA chromosome in scRNAseq? #141

Open jzussman opened 3 weeks ago

jzussman commented 3 weeks ago

Hello, I ran cellsnp-lite with the following command for calling variants in mitochondrial transcripts in 10X scRNAseq data and for each of three independent samples, received 16,299 variants, which is roughly the length of the mitochondrial chromosome in mouse. When I ran these results through MQuad to identify informative variants, unsurprisingly no informative variants were identified. Is this behavior of identifying as many variants as there are bases in the chromosome across a population of ~3000 cells per sample expected? I'm quite confident that there are not this many actual variants across this dataset; are there settings to change that would increase my likelihood of calling and detecting informative variants during later downstream steps? I crafted this specific command from the recommendations of the MQuad development team on their github page.

cellsnp-lite -s path/to/possorted_genome_bam.bam -b path/to/filtered_feature_bc_matrix/barcodes.tsv.gz -O path/to/output_dir --chrom=chrM --UMItag Auto --minMAF 0 --minCOUNT 0 --genotype --gzip -p 10

hxj5 commented 3 weeks ago

Hi, it is expected that almost every mito position is outputed by cellsnp-lite, since the --minMAF and --minCOUNT are both set to 0 (i.e., there is no any SNP filtering performmed). You may report the issue in the MQuad repo to check why there is no any informative SNPs.

jzussman commented 3 weeks ago

Got it, that makes sense, thank you very much!