samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
649 stars 240 forks source link

indel ALT gets removed during calling, but leaves invalid indel line #2035

Closed mcolpus closed 10 months ago

mcolpus commented 10 months ago

I'm using bcftools mpileup and call to align long read nanopore data. It produces some odd lines in the resulting vcf like pos 2528:

ref 2527    .   A   .   222.589 .   DP=19;ADF=5;ADR=9;SCR=17;FS=0;MQ0F=0;AN=2;DP4=5,9,0,0;MQ=60 GT:SP:AD    0/0:0:14
ref 2528    .   C   .   192.589 .   DP=20;ADF=5;ADR=7;SCR=17;FS=0;MQ0F=0;AN=2;DP4=5,7,0,0;MQ=60 GT:SP:AD    0/0:0:12
ref 2528    .   C   .   28.9651 .   INDEL;IDV=2;IMF=0.0952381;DP=21;ADF=1;ADR=3;SCR=17;VDB=0.915813;SGB=-0.379885;RPBZ=1.85824;SCBZ=-1.39727;FS=0;MQ0F=0;AN=2;DP4=1,3,1,0;MQ=60 GT:SP:AD    0/0:0:4
ref 2529    .   A   .   176.588 .   DP=21;ADF=4;ADR=7;SCR=17;FS=0;MQ0F=0;AN=2;DP4=4,7,0,0;MQ=60 GT:SP:AD    0/0:0:11

It's called it an indel but provided no ALT. However, looking In the pileup there is an ALT:

ref 2527    .   A   <*> 0   .   DP=19;ADF=5,0;ADR=9,0;SCR=17;I16=5,9,0,0,326,7828,0,0,840,50400,0,0,335,8225,0,0;QS=1,0;FS=0;MQ0F=0 PL:SP:AD    0,42,193:0:14,0
ref 2528    .   C   <*> 0   .   DP=20;ADF=5,0;ADR=7,0;SCR=17;I16=5,7,0,0,254,5554,0,0,720,43200,0,0,300,7500,0,0;QS=1,0;FS=0;MQ0F=0 PL:SP:AD    0,36,163:0:12,0
ref 2528    .   C   CG  0   .   INDEL;IDV=2;IMF=0.0952381;DP=21;ADF=1,1;ADR=3,0;SCR=17;I16=1,3,1,0,160,6400,40,1600,240,14400,60,3600,100,2500,25,625;QS=0.767123,0.232877;VDB=0.915813;SGB=-0.379885;RPBZ=1.85824;SCBZ=-1.39727;FS=0;MQ0F=0    PL:SP:AD    2,0,35:0:4,1
ref 2529    .   A   <*> 0   .   DP=21;ADF=4,0;ADR=7,0;SCR=17;I16=4,7,0,0,222,4654,0,0,660,39600,0,0,275,6875,0,0;QS=1,0;FS=0;MQ0F=0 PL:SP:AD    0,33,147:0:11,0

So it seems like bcftools call determines that the indel is low quality, but only half removes it. I'm not even sure if the resulting line is valid for vcf?

What's going on here? Is this expected behaviour.

commands:

mpileup -f ref.fa.gz --threads 5 -x -Q 13 -a INFO/SCR,INFO/ADR,INFO/ADF,FORMAT/SP,FORMAT/AD -h100 -M10000 -o sample.pileup.vcf sample.sorted.sam
bcftools call --ploidy 2 -m -o sample.vcf sample.pileup.vcf
pd3 commented 10 months ago

It is not invalid, it just means the indel did not survive the calling step, there was not enough support for the indel. If run as bcftools call -mA, the considered alternate alleles will not be removed, even when unused. If run with -mv, such lines will not show up at all.