jamescasbon / PyVCF

A Variant Call Format reader for Python.
http://pyvcf.readthedocs.org/en/latest/index.html
Other
397 stars 201 forks source link

parsing converts QUAL == zero to period, inconsistent numeric type breaks downstream filtering #329

Open katrinakalantar opened 3 years ago

katrinakalantar commented 3 years ago

Hello,

When parsing an input .vcf file with a QUAL value = 0.0, i.e.: MN908947.3 12739 . T TA 0.0 PASS DP=3.0;DPS=3.0,0.0 GT:GQ 1:0

the float QUAL value is converted to a . which breaks float comparison filters in downstream applications (specifically working with artic minion pipeline where the above line is parsed to create the following: MN908947.3 12739 . T TA . PASS DP=3.0;DPS=3.0,0.0;Pool=nCoV-2019_2 GT:GQ 1:0)

It appears that this occurs as a function of this logic: https://github.com/jamescasbon/PyVCF/blob/master/vcf/parser.py#L696

Would it be reasonable to require consistent types for the QUAL field when parsing (all numeric)?

cjw85 commented 3 years ago

I don't think it correct to require that all QUAL fields be numeric, there is no constraint in the VCF specification to this end. A QUAL of . is perfectly valid for a single record, and stands independent of the content of other records.

It is certainly incorrect however that QUALs of 0.0 are being coerced to Boolean false in that code and then replaced with the missing value ., which has a very different meaning to zero.