t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
39 stars 23 forks source link

slamdunk snp(varscan2) missing SNP around INDEL #117

Closed dyinboisry4u closed 6 months ago

dyinboisry4u commented 2 years ago

Hi, I found slamdunk snp(varscan2) missing SNPs around INDEL(SNP-INDEL), and it seems to cause false positive, such as:

my mpileup file (use slamdunk snp argument create):

chr4 5742730 T 73 C+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1G.C+1GC+1GC+1GC+1GC+1G.C+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC +1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1GC+1G C+1GC+1G FFBFFFFBFFF/FFFFF/FBFFFFF/FFFFF/F<<FFFFFFFFFF<BFFFFFFFFFFFFFFFFBF</BFFB</ but in .vcf file, "chr4 5742730" is missed

IGV:

1

my command:

slamdunk map -r ../genome/mm10.fa -o ./ -t 160 ../data/SRR5678911.fastq
slamdunk filter -o ./ SRR5678911_slamdunk_mapped.bam
slamdunk snp -r ../genome/mm10.fa -o ./ -t 64 SRR5678911_slamdunk_mapped_filtered.bam
alleyoop read-separator -o ./ -s ./ -r ../genome/mm10.fa ./

I also check other "SNP-INDEL" locus, they have similar results. so I want to know what might be going wrong? Thanks!

t-neumann commented 2 years ago

By default the variant fraction is from what I recall 0.8. Could be that this threshold is not met yet for these sites. For diploid samples we always set it to 0.2

dyinboisry4u commented 2 years ago

Actually I don't know if "variant fraction" means "Count of observations of this alternate / All read counts", and if that's right, the variant fraction of "chr4 5742730" site should greater than 0.8 threshold. How do I deal with these sites?

t-neumann commented 2 years ago

Hm can u check in the resulting vcf file for this site, if there is anything reported?

dyinboisry4u commented 2 years ago

Varscan2 doesn't report anything in the site..😭

t-neumann commented 2 years ago

Can you try running slamdunk snp with a very low variant fraction, like -f 0.2 and see if then something is reported?

dyinboisry4u commented 2 years ago

it doesn't work

t-neumann commented 2 years ago

Hm then I guess Varscan2 really chokes on the indel being right next to it. I would try to have a look into Varscan2 parameters that would allow for such calls

dyinboisry4u commented 2 years ago

raw data: https://www.ncbi.nlm.nih.gov/sra?term=SRX2914406 slamdunk version: slamdunk 0.4.3

And actually I have tried bcftools and freebayes, both of them can't call this site.

t-neumann commented 2 years ago

Yeah this is unfortunately then a problem of the variant caller and not slamdunk itself. I couldnt really tune it to get it working. Do you have many such cases?