pcingola / SnpEff

Other
244 stars 78 forks source link

No white-space, semi-colons, or equals-signs are permitted in INFO field values #298

Closed puva closed 3 years ago

puva commented 3 years ago

This a duplicate of #279 that I couldn't reopen.

pcingola commented 3 years ago

I've looked into this and it is OK for SnpEff to throw the error because the input VCF file is invalid.

As the error message says, your VCF file is corrupted due to illegal characters in the INFO field value:

$ grep 930248 dbSNP153_small.vcf | cut -f 8 | tr ';' '\n' | grep ^CLNHGVS
CLNHGVS=NC_000001.11:g.930248=,NC_000001.11:g.930248G>A
#                            ^ The equal sign is an illegal character in an INFO field 

This clearly has an equal sign (=) in the INFO field value, which is illegal.

Please take a look at the VCF specification, page 5, definition of an INFO field:

https://samtools.github.io/hts-specs/VCFv4.2.pdf

INFO - additional information: (String, no whitespace, semicolons, or equals-signs permitted; commas are permitted only as delimiters for lists of values) INFO fields are encoded as a semicolon-separated series of short keys with optional values in the format: =[,data]. If no keys are present, the missing value must be used. Arbitrary keys are permitted, although the following sub-fields are reserved (albeit optional):

So the error message is correct. You need to fix the input VCF file.

pcingola commented 3 years ago

I'll try to make SnpEff/SnpSift emit a warning instead of an error in future versions (honestly, I'm not sure if allowing people to shoot themselves in the foot is a good idea).