mskcc / facets

Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
144 stars 67 forks source link

snp-pileup for mm10/GRCm38 mouse #203

Open ahwanpandey opened 2 weeks ago

ahwanpandey commented 2 weeks ago

Hello,

Could you point me to a proper VCF to do snp-pileup on WGS for mm10/GRCm38?

I downloaded 00-All.vcf.gz but it has 71,202,368 SNPs which I feel is way too many for the purpose of snp-pileup?

I am also having a look here and the file sizes are much smaller and seem more reasonable, but it doesn't exactly have "C57BL/6J" which is the strain we are using. However it does have "C57BL/6NJ". https://ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-1505-SNPs_Indels/strain_specific_vcfs/

Thanks for your help!

Best, Ahwan

ahwanpandey commented 2 weeks ago

I've also found this with "8,213,470" SNPs https://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/snp142Common.txt.gz

And described by the following post: "Common SNPs (142): uniquely mapped variants that appear in at least 1% of the population" https://groups.google.com/a/soe.ucsc.edu/g/genome-announce/c/VuZlU_vCPx4

Please advise the best SNP reference for mouse samples!

veseshan commented 2 weeks ago

While the VCF has 71 million SNPs you can process it to reduce the number. Any polymorphism which is not single nucleotide should be removed. Any with more than one alternate allele should also be removed. The VCF also has a lot of columns that are not useful for snp-pileup and can be removed to reduce the file size.

ahwanpandey commented 2 weeks ago

Thanks for you input @veseshan . I will try the 00-All.vcf.gz file with the following filter. Comes up to 70,672,993 SNPs

bcftools view --types snps --max-alleles 2 orig/dbsnp.vcf.gz | cut -f 1-5 | bgzip -cf > dbsnp.snps_only.no_multi_allele.vcf.gz

I also found this resource that might be useful to try: https://kharchenkolab.github.io/numbat/articles/mouse.html

ahwanpandey commented 2 weeks ago

Hi again @veseshan

I am new to mouse WGS analysis and am just reading something that seems to suggest inbred (pure) genetic background mice lack heterozygous SNPs: https://github.com/kharchenkolab/numbat/issues/198

So does this mean I won't be able to run FACETS either on our pure "C57BL/6J" mice?

Thanks, Ahwan

veseshan commented 1 day ago

If there are no (or few) heterozygous snps then one can't estimate allelic imbalance which is necessary for allele specific copy numbers - the goal of facets.