samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
278 stars 244 forks source link

Remove validation of VCF Header line field order? #1610

Open tfenne opened 2 years ago

tfenne commented 2 years ago

@lbergelson, @droazen and anyone else who may be interested. Following the discussion in https://github.com/samtools/hts-specs/issues/642 would there be support for (or any objections to) a PR that eliminated the validation of ordering of fields within a given VCF header line?

This issue came up because a[n old] version of one of the GATK's SV tools produces this header line:

##INFO=<ID=END2,Type=Integer,Number=1,Description="Position of breakpoint on CHR2">

instead of the more common:

##INFO=<ID=END2,Number=1,Type=Integer,Description="Position of breakpoint on CHR2">

The discussion on the spec issue hasn't led to a PR yet but there seems to be consensus on clarifying the language to make it clear that there is no required ordering of fields within a single header line. I'm not really sure why HTSJDK validates this in the first place and it makes the header parsing code quite a bit more complicated too. I'd like to submit a PR to remove the checking but would appreciate knowing in advance if folks are receptive to it.

cc @nh13

cmnbroad commented 2 years ago

I'd love to see that code removed - its a pretty awkward way to do the order validation anyway. I'd prefer to see it done as part of https://github.com/samtools/htsjdk/pull/1581 though (I'm happy to make the changes there), since that already changes the same public APIs that will have to change for this.