single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
128 stars 11 forks source link

Small doubt on preprocessing SNP data for mouse #139

Open mauripops opened 1 week ago

mauripops commented 1 week ago

I am currently working on processing a vcf file for the SNPs for mice from data from the Wellcome Sanger Mouse Genome Project as listed in here: https://www.jax.org/research-and-faculty/genetic-diversity-initiative/tools-data/diversity-outbred-reference-data

Following the processing as explained for human data here: https://github.com/single-cell-genetics/cellsnp-lite/blob/master/scripts/SNPlist_1Kgenome.sh Resulting in the data here: https://sourceforge.net/projects/cellsnp/files/SNPlist/

I was wondering, do the SNPs provided above for humans contain intergenic variants?

My endgoal is to run cellsnp-lite for demultiplexing scRNAseq reads using vireo. Should I remove the intergenic variants? Or is there a reason they should be kept?

hxj5 commented 4 days ago

The VCF does include intergenic variants, most of which are expected to be filtered during pileup by cellsnp-lite because they are expressed/present in few reads of scRNA-seq data. However, some expressed intergenic variants (could be due to technical reasons or imperfect gene annotations, etc) could provide additional genotype information for vireo to distinguish different donors.