samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
663 stars 240 forks source link

bcftools mpileup | bcftools call producing Non-zero PL Across All genotypes #1486

Open plhm opened 3 years ago

plhm commented 3 years ago

Hi all,

I have recently produced a vcf file with bcftools mpileup | bcftools call for a RAD-seq like dataset. While looking at the vcf file produced by the bcftools variant calling pipeline I noticed that for at least one specimen, the pipeline produced a variant call that assigns non-zero values for all three genotypes. Here's the call for the variant:

1 69697716 . G C 999 . DP=676;VDB=0;SGB=396.022;RPB=0.53445;MQB=0.9981;BQB=0.834494;MQ0F=0;ICB=1.82473e-06;HOB=0.2968;AC=69;AN=150;DP4=308,0,362,0;MQ=59 GT:PL:AD ./.:0,0,0:0,0 0/0:17,38,180:7,0

My experience when using GATK is that the most likely genotype would get a Phred score = 0. I could not find if this would necessarily be true for the bcftools pipeline, even though I assume it to be so, given that this far all variants but this one have one genotype assigned a Phred score of 0. I could not find, however, a website or a publication explaining how bcftools scales its Phred scores, and why a value such as the one I show above would be possible. Could someone point me towards this publication/webpage?

Thank you.

P

pd3 commented 3 years ago

That's an interesting case. First a general comment, the minimum PL does not have to be 0. Any normalization would be done in probability space (i.e. not in phred/log space). However, based on experience 0 will be typically present, I am not sure why it is not in this case. If you'd like to understand it more, could you please provide the output for this site from the mpileup step? I am assuming you are running the latest version of bcftools.