Closed lukaas33 closed 3 months ago
Thank you for your interest in pVACtools and reaching out to us with this errors. This is a small issue with the input VCF where the meta-information about a field is defined incorrectly. In this particular case the AF
FORMAT
field is defined as an Integer
field in its header. This is incorrect, because this field contains floating point values/decimals. The VCF parser we use is pretty strict about casting field values to the types defined in their respective headers so in this case it is trying to convert the number in this field (a decimal) to an integer, which fails. This issue can be fixed by simply editing the AF
header line and changing the Type
to be Float
instead of Integer
.
In addition, this field also has its Number
defined incorrectly. This error pops up after fixing the Type
of the AF
FORMAT
field. The Number
for this field is set to 1
, which is supposed to mean that this field will only ever contains a single number. However, the field really contains one number per alt allele. To correct this, the field header should be changed to Number=A
, as per the VCF spec.
In summary, please replace the following line
##FORMAT=<ID=AF,Number=1,Type=Integer,Description="Allelic frequency for the alt alleles in the order listed">
with this line
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allelic frequency for the alt alleles in the order listed">
Hi,
Thank you so much! Since this was a wrongly formatted vcf file you could close this issue. Unless you want to catch this specific case?
Installation Type
Docker
pVACtools Version / Docker Image
griffithlab/pvactools:latest
Python Version
No response
Operating System
No response
Describe the bug
I have verified that pvactools is setup correctly by running the example command as specified here https://pvactools.readthedocs.io/en/latest/pvacseq/getting_started.html and seeing that predictions are created.
When running the tools on a real inputfile (preprocessed with VEP) taken from here https://pdmdb.cancer.gov/pdm/145666~245-R~AJA~v2.0.2.51.0~WES.vcf, the following errors are observed:
CannotConvertValue: 1.00 cannot be converted to Integer, keeping as string.
and then:TypeError: '>' not supported between instances of 'str' and 'int'
It seems that the sample column of this file contains frequencies that are between 0 and 1 and this gives an error in the parser. I have looked at the example vcf data and this does not occur.
This issue can be avoided by adding mock sample data using
cf-genotype-annotator inputpath samplename 0/1 -o outputpath
. But with this workaround not all data is used.Is this a limitation of pvactools, wrongly formatted data or am I using a wrong command?
How to reproduce this bug
Input files
Raw input data from https://pdmdb.cancer.gov/pdm/145666~245-R~AJA~v2.0.2.51.0~WES.vcf: 145666-245-R-AJA-v2.0.2.51.0-WES.zip
Data preprocessed with VEP: https://tuenl-my.sharepoint.com/:u:/g/personal/l_c_a_w_v_osenbruggen_student_tue_nl/EedDI_dhjThBnwuvH2PLp3ABY6lTKh9-JfuuXrVdKSBy4A?e=6AMaUy
Log output
CannotConvertValue: 1.00 cannot be converted to Integer, keeping as string. and then: TypeError: '>' not supported between instances of 'str' and 'int'
Output files
No response