Open wesleymarin opened 4 years ago
I agree this looks odd. Going backwards starting with the information that is fully available from your description:
bcftools call
relies on the QS and PL annotations. There the main problem is that the likelihood of the homozygous ref genotype (14) is not sufficiently big to override the default prior 1.1e-3. The prior can be increased e.g. as bcftools call -P 4e-2
, then the correct(?) 1/1 genotype will be called. Note that this is not a problem with the -m
calling model, the original bcftools call -c
samtools calling model gives the same answer.
So probably the more important question is why bcftools mpileup
thinks the homozygous reference is so likely. This is impossible to say without seeing the alignments, and even then it may be difficult to answer. Try to run with bcftools mpileup --no-BAQ
. Also try to increase --min-MQ
if it changes anything.
Hello, I have a similar problem when I try to call indels with high depth (bcftools version: 1.10.2), but in my case I think the problem comes from the -d
option.
For example:
BCFTOOLS CALL
bcftools mpileup -Q 20 -B -d 250 --ignore-RG -a 'DP,AD,ADF,ADR,SP' -O u -f /genome/hg19sorted.fa -r 15 normal.bam tumor.bam | bcftools call -vm -P 1e-1 --ploidy GRCh37 |grep 93528782
----- call output
15 93528782 . TCC TC 235 . INDEL;IDV=108;IMF=0.444444;DP=492;VDB=0.000886062;SGB=67.0654;MQSB=0.993151;MQ0F=0;ICB=0.3;HOB=0.125;AC=1;AN=4;DP4=364,29,91,8;MQ=59 GT:PL:DP:SP:ADF:ADR:AD 0/0:0,255,255:249:0:236,0:13,0:249,0 0/1:249,0,255:243:3:128,91:16,8:144,99
BCFTOOLS MPILEUP
bcftools mpileup -Q 20 -B -d 250 --ignore-RG -a 'DP,AD,ADF,ADR,SP' -O u -f /genome/hg19sorted.fa -r 15 normal.bam tumor.bam |grep 93528782
----- mpileup output
15 93528782 . T <*> 0 . DP=492;I16=395,27,0,0,14056,476454,0,0,25319,1.51908e+06,0,0,7183,152099,0,0;QS=2,0;MQSB=1;MQ0F=0 PL:DP:SP:ADF:ADR:AD 0,255,255:206:0:196,0:10,0:206,0 0,255,255:216:0:199,0:17,0:216,0
15 93528782 . TCC TC 0 . INDEL;IDV=108;IMF=0.444444;DP=492;I16=364,29,91,8,12334,466190,3298,110608,23434,1.40163e+06,5940,356400,6684,141566,1756,37322;QS=1.58914,0.410863;VDB=0.000886062;SGB=67.0654;MQSB=0.993151;MQ0F=0 PL:DP:SP:ADF:ADR:AD 0,255,255:249:0:236,0:13,0:249,0 249,0,255:243:3:128,91:16,8:144,99
The example above is fine because bcftools performs the calling well. However, I'm working with high coverage BAMs and I wanted to increase the maximum number of reads per file with the -d
option (e.g. -d 260
), but I lose the mutation:
BCFTOOLS CALL
bcftools mpileup -Q 20 -B -d 260 --ignore-RG -a 'DP,AD,ADF,ADR,SP' -O v -f /genome/hg19sorted.fa -r 15 normal.bam tumor.bam | bcftools call -vm -P 1e-1 --ploidy GRCh37 |grep 93528782
----- call output
No results
BCFTOOLS MPILEUP
bcftools mpileup -Q 20 -B -d 260 --ignore-RG -a 'DP,AD,ADF,ADR,SP' -O v -f /genome/hg19sorted.fa -r 15 normal.bam tumor.bam |grep 93528782
----- mpileup output
15 93528782 . T <*> 0 . DP=511;I16=406,32,0,0,14567,492891,0,0,26279,1.57668e+06,0,0,7233,152401,0,0;QS=2,0;MQSB=1;MQ0F=0 PL:DP:SP:ADF:ADR:AD 0,255,255:213:0:202,0:11,0:213,0 0,255,255:225:0:204,0:21,0:225,0
I think that the problem is in the bcftools mpileup
command that doesn't detect the indel, but I have no idea why, since I'm only increasing the maximum number of reads, I can understand that it may be lost in the calling but not in the raw pileup.
In order to debug this, we'd need to see a slice of the BAM to reproduce the problem. Sorry I can't be of much help here.
Hello, I am having trouble getting bcftools to call the ALT allele with 61 DP4 depth, it instead is only showing the REF call with 0 DP4 read depth.
The ALT call is correctly identified in the 'bcftools mpileup' output (shown below) where the I16 field shows 0 REF reads and 61 ALT reads, and based on what I understand about the PL output, mpileup identified the correct genotype with the best match being G/G, but these results do not carry over to the 'bcftools call' output, which says the genotype is C/C even though there are 0 DP4 reads supporting this genotype.
I am using bcftools version 1.10.2-105-g7cd83b7
I have tried fiddling with many settings, but have not been able to resolve this issue. At this point I am not sure if I am missing a setting that causes 'bcftools call' to completely ignore the ALT call, or if it is a bug or something else. Any help would be greatly appreciated.
I can supply sequence and reference files to replicate.
BCFTOOLS MPILEUP command:
bcftools mpileup -m 3 --threads 60 -d 500 -F 0.0002 -f alleleReference.fasta IND00040.sorted.bam -T alleleReference.bed -O v -o IND00040.vcf
----- mpileup output KIR2DL1*0020101 4522 . C G,T,<*> 0 . DP=92;I16=0,0,33,28,0,0,2055,80321,0,0,186,826,0,0,1193,27631;QS=0,0.971631,0.0283688,0;VDB=0.999999;SGB=-0.693147;MQSB=0.910046;MQ0F=0.0434783 PL 14,181,0,17,166,14,14,181,17,14
BCFTOOLS CALL command:
bcftools mpileup -m 3 --threads 60 -d 500 -F 0.0002 -f alleleReference.fasta IND00040.sorted.bam -T alleleReference.bed | bcftools call --multiallelic-caller -O v -o IND00040.vcf
----- call output KIR2DL1*0020101 4522 . C . 15.3772 . DP=92;VDB=0.999999;SGB=-0.693147;MQSB=0.910046;MQ0F=0.0434783;AN=2;DP4=0,0,33,28;MQ=3 GT 0/0Again, the 'G' in the mpileup output should be the correct base for this position, but it is completely missing from the bcftools call output. I also tried without the -m3 and -F0.0002 options, but the results were the same.
edit It looks like the 0/0 genotype calls are specific to positions with low MQ, but I do not see any settings in bcftools call to adjust that behavior.