samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
681 stars 240 forks source link

bcftools ignores indel even when using --min-ireads 10 #1809

Open Rohit-Satyam opened 2 years ago

Rohit-Satyam commented 2 years ago

I tried variant calling using bcftools using recommended settings for SARS-Cov-2 i.e. using --ignore-overlaps --min-ireads 10 on my ISeq Illumina data. However, I can see that a deletion supported by 30 reads is being ignored (I reran without above flags and compared the vcf files). For some reason IDV=9. Below attached the coverage of the deletion and the coverage plot

bcftools mpileup --redo-BAQ --max-depth 0 --min-BQ 20 --min-MQ 20 --count-orphans --fasta-ref ../00_index/Sars_cov_2.ASM985889v3.dna.toplevel.fa --annotate AD \
 --ignore-overlaps --min-ireads 10 16_S16_L001.dedup.bam |  bcftools call --multiallelic-caller     --variants-only      --ploidy 1      --output-type b - > temp2.bcftools.bcf
MN908947.3      28361   .       GGAGAACGCAG     GG      225     .       INDEL;IDV=9;IMF=1;DP=9;VDB=0.156835;SGB=-0.662043;MQ0F=0;AC=1;AN=1;DP4=0,0,0,9;MQ=60    GT:PL:AD        1:255,0:0,9

16_S16_L001

image

pd3 commented 2 years ago

It would be helpful to have a small test case to reproduce the problem and improve the code.

In the meantime, this branch has a new mpileup option --indels-2.0 which is a testing ground for improved handling of indels https://github.com/samtools/bcftools/tree/indel-revamp. Maybe you could try it on your data to see if it solved the problem. Note that this is an EXPERIMENTAL feature. If you decide to test it, please let us know the results.

Rohit-Satyam commented 2 years ago

I can share the BAM file that can serve as a test case. Note: This is ISeq data. I will try the recommendations you made. 16_S16_L001.dedup.zip

pd3 commented 2 years ago

OK, I checked and can confirm that the read counts are not correct with any version. I'll add this to the test suite to investigate later.

By the way, it's strange that the reads have 20 bases soft clipped even though they match the reference genome perfectly.