samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

VCFHeader and VCFHeaderLine rewrite/refactoring #1581

Open cmnbroad opened 2 years ago

cmnbroad commented 2 years ago

This PR is resurrected from the PR graveyard, updated and rebased. It contains:

codecov-commenter commented 2 years ago

Codecov Report

Merging #1581 (3084903) into master (347c0ac) will increase coverage by 0.022%. The diff coverage is 79.088%.

:exclamation: Current head 3084903 differs from pull request most recent head a4c529d. Consider uploading reports for the commit a4c529d to get more accurate results

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@               Coverage Diff               @@
##              master     #1581       +/-   ##
===============================================
+ Coverage     69.856%   69.878%   +0.022%     
- Complexity      9695      9832      +137     
===============================================
  Files            703       706        +3     
  Lines          37772     38022      +250     
  Branches        6139      6155       +16     
===============================================
+ Hits           26386     26569      +183     
- Misses          8929      8965       +36     
- Partials        2457      2488       +31     
Impacted Files Coverage Δ
...in/java/htsjdk/samtools/SAMSequenceDictionary.java 75.000% <ø> (ø)
...c/main/java/htsjdk/variant/vcf/VCFRecordCodec.java 0.000% <0.000%> (ø)
src/main/java/htsjdk/variant/vcf/VCFCodec.java 25.000% <22.222%> (-52.273%) :arrow_down:
src/main/java/htsjdk/variant/vcf/VCFUtils.java 45.570% <28.571%> (-3.706%) :arrow_down:
src/main/java/htsjdk/variant/vcf/VCF3Codec.java 57.895% <50.000%> (-5.520%) :arrow_down:
...n/java/htsjdk/variant/vcf/VCFFormatHeaderLine.java 66.667% <57.895%> (-3.333%) :arrow_down:
...tsjdk/variant/variantcontext/writer/VCFWriter.java 77.451% <65.000%> (-5.345%) :arrow_down:
...main/java/htsjdk/variant/vcf/VCFHeaderVersion.java 75.000% <66.667%> (+3.125%) :arrow_up:
...n/java/htsjdk/variant/vcf/VCFSampleHeaderLine.java 66.667% <66.667%> (-33.333%) :arrow_down:
...main/java/htsjdk/variant/vcf/VCFAltHeaderLine.java 72.727% <68.421%> (-27.273%) :arrow_down:
... and 22 more

... and 69 files with indirect coverage changes

cmnbroad commented 1 year ago

Note: this needs to be resolved/updated to account for the final disposition of structured vs. unstructured lines, attribute ordering, and quoting, as described in https://github.com/samtools/hts-specs/issues/642/ https://github.com/samtools/htsjdk/issues/1610, and also https://github.com/samtools/hts-specs/pull/620.

ohofmann commented 1 year ago

We've been discussing the lack of support for VCF 4.4 in IGV (https://github.com/igvteam/igv/issues/1289) and started looking at the current htsjdk / VCF interaction. This PR looks like a major step forward to resolve issues around VCF 4.3 - thanks for all your work on this! Any chance of merging this soon so we can consider what changes would be required for 4.4?

cmnbroad commented 1 year ago

@ohofmann Hi Oliver - yes, this PR addresses quite a few issues with VCF support in htsjdk, and is a prerequisite for a couple of other large PRs for VCF/BCF support. The team has been sidetracked for quite a while with upgrading several of these repos to Java 17, which turned out to be a lot more work than anticipated, but we finally got the main PR for that merged last week. We discussed this PR in our team meeting this morning, and we do plan to get it merged sometime in the next quarter. It has had a fair amount of review/discussion already, but its quite large and will require a major version release, and has significant downstream test impact on projects such as GATK (I haven't tried to build IGV with it, but I expect the IGV fallout to be minimal). Hope that help.s

ohofmann commented 1 year ago

@cmnbroad Thanks for the update! It does help, though we may have to consider postponing the VCF 4.4 announcement / PR drive. I wouldn't want the first experience of many users to be an htsjdk error. Will keep an eye on this PR in the meantime.

cmnbroad commented 1 year ago

@lbergelson I've rebased this on master/Java 17.