jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
105 stars 43 forks source link

VCF deviates from the VCF v4.2 specification #387

Open mhkc opened 2 months ago

mhkc commented 2 months ago

Hi Jody,

The VCF produced by TbProfiler v6.3 violates the VCF v4.2 specification which might cause issues with parser and genome browsers. TbProfiler reports GQ values as Floats but the specification mandate that they should be Integers.

GQ (Integer): Conditional genotype quality, encoded as a phred quality −10log10 p(genotype call is wrong, conditioned on the site’s being variant)

Abbreviated example VCF

##fileformat=VCFv4.2
...
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
...
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  test1_240816_nb000000_0000_test
NC_000962.3 2784611 DEL00000028 T   <DEL>   5670    PASS    PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.2.6;END=2785969;PE=75;MAPQ=60;CT=3to5;CIPOS=-6,6;CIEND=-6,6;SRMAPQ=60;INSLEN=0;HOMLEN=5;SR=20;SRQ=1;CONSENSUS=CGGCGCGAATTGCTGGCCACCCGGAACTTGACGACCTCTTGATCACCGACTTTGCGGCGCTGCAAAT    CGTTGACGATGTGACCGACCACGGTCAGTGGCGTTTCGAACATTTGCTCATTCCTTTCCTAGTTGCGTTGGCACAGTTGCGTTGGCACCGGGTGATTCCGCGAACTGCCCACGCATATGC;CE=1.97451;CONSBP=92;AC=2;AN=2   GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-395.943,-39.6786,0:10000:PASS:419:45:380:0:0:75:0:132
jodyphelan commented 2 months ago

Hi @mhkc

Thanks for letting me know. Tb-profiler itself does not create the VCFs. I've just checked the tools and it looks like the issue is with freebayes, so perhaps opening an issue there would be useful.

jodyphelan commented 2 months ago

On looking at freebayes a bit closer I noticed the --strict-vcf option which forces GQ to be an int. I'll add this into tb-profiler when the call to freebayes is made.