Open myourshaw opened 2 years ago
I get the same problem, any update on this issue ?
I hoped switching to PyVCF3 (c.f. #335 ) would solve the issue but apparently not.
My bad, in my case the problem originated from a tag Source
in a FILTER
field:
##FILTER=<ID=xxx,Description="yyy",Source="zzz">
which is a INFO
field tag according to https://samtools.github.io/hts-specs/ and not a FILTER
field tag.
Please comment this issue on pyvcf3 https://github.com/dridk/PyVCF3/issues/1
Background: In FILTER, multiple filters should be separated by semicolons. The widely used, but not actively maintained, VarScan2 genomic variant caller uses commas instead. Moreover, VarScan2 does not add ##FILTER metadata for most of its filters. Picard FixVcfHeader can be used to fix missing FILTER metadata. A "fixed" metadata row will look like:
##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
Error: PyVCF fails with: ` Traceback (most recent call last): File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 236, in
main()
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 232, in main run(parser.parse_args())
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 166, in run df_1 = vcf_to_dataframe(args.vcf_1)
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 74, in vcf_to_dataframe vcf_reader = vcf.Reader(open(vcf_file, "r"))
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 300, in init self._parse_metainfo()
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 326, in _parse_metainfo key, val = parser.read_filter(line)
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 142, in read_filter raise SyntaxError(
SyntaxError: One of the FILTER lines is malformed: ##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader"> `
Issue: It might be more robust for PyVCF to treat a filter with commas as just one big filter name, as does Picard FixVcfHeader. Instead of raising an exception, accept metadata with a filter ID inside double quotes and containing commas, e.g.,
ID="RefAvgRL,VarAvgRL"
. Similarly, in the data, treat a FILTER value likeRefAvgRL,VarAvgRL
as a single entity. I think this solution is consistent with the VCF 4.2 spec for a filter name:String, no whitespace or semicolons permitted
.Possible pull request: This hack (changing.+),\s
Description="(?P[^"] )"
[^,] +
to.+
worked to get me through an urgent analysis, but it may not be the best solution. At parser.py line 142 ` self.filter_pattern = re.compile(r'''##FILTER=< ID=(?P