Closed deep-introspection closed 10 years ago
Hi Guillaume,
The problem is an error in your VCF file. If you look at your second example record, you see this in the info field:
...;GRP=-3.42;HRun;MQ=0;...
So the HRun
key is used as a boolean flag, but in the header it is defined to have an integer value:
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
So this is incorrect. I see the file is created by the GATK UnifiedGenotyper. In one of their help pages, I see the HRun
annotation field being mentioned, where all examples include an integer value.
Even if we would want to handle this case in PyVCF, I'm not even sure what to do with it. What would it mean?
If you're producing the file yourself you may want to try updating GATK or asking the GATK developers.
Thanks for those precisions. I am unfortunately not producer of the VCF and thus would have to solve this by modifying it in its current version. Since I won't use this information HRun, integer or boolean, do you think the file can be slightly modified (with a regexp for instance) so PyVCF wont't get confused?
Cheers!
Guillaume
I see. If you're on Linux/OSX/Unix you could use sed
to remove these values:
sed 's/;HRun;/;/' original.vcf > fixed.vcf
This ignores any HRun
occurrences directly at the start or end of the INFO field, but that probably never happens.
It could however be the case that the file still has other (similar) errors.
This works perfectly! Thanks a lot for your help.
Guillaume
Hi!
It seems I am running into a common problem with Headers but can't figure out how to solve it. I am using PyVCF-0.6.7-py2.7-macosx-10.5-x86_64 and try to iterate records on a VCF file and get this error:
Apparently, the error comes in when PyVCF reads the first line of the VCF that contains an alternative allele (before there is only missing information or "."):
The Header of the VCF is this one:
Any idea of how I can make PyVCF comes through this?
Cheers!
Guillaume