samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Plugin framework and interfaces for versioned file format codecs #1525

Closed cmnbroad closed 3 years ago

cmnbroad commented 3 years ago

This PR is part of the work funded by the CZI grant, and implements a plugin framework for packaging, discovering, and loading versioned file format codecs. Its not yet ready for code review, just for soliciting discussion/feedback. Many codepaths are not yet implemented, and the test coverage is the minimum necessary to demonstrate the concepts.

(A more detailed description of the framework can be found in the HtsCodec javadoc).

The new interfaces are all strictly opt-in, and provide an alternative to the various existing mechanisms for packaging and loading codecs (SamReaderFactory, SAMFileWriterFactory, VariantContextWriterBuilder, etc.). All existing interfaces and codec implementations are retained as is so as not to disrupt existing projects.

Partial/skeletal codec implementations are currently included for FASTA (read only), HTSGET BAM (read only), BAM (read/write), VCF4.2 (read/write), and CRAM v3.0 (read/write). Here are some tests that demonstrate common usage patterns:

Some additional detail:

The implementations included here delegate to the existing encoder/decoder components, and expose existing record interfaces such as SAMRecord and VariantContext, though the framework will enable additional codecs to be implemented that use alternative/updated interfaces. Additionally an upgrade chain will be implemented to allow codecs to run version upgrade transformations on both file headers and records.

codecov-commenter commented 3 years ago

Codecov Report

Merging #1525 (f0c35b1) into master (8466c82) will increase coverage by 0.285%. The diff coverage is 73.437%.

:exclamation: Current head f0c35b1 differs from pull request most recent head 1caf3f4. Consider uploading reports for the commit 1caf3f4 to get more accurate results

@@               Coverage Diff               @@
##              master     #1525       +/-   ##
===============================================
+ Coverage     69.525%   69.810%   +0.285%     
- Complexity      9072      9627      +555     
===============================================
  Files            617       702       +85     
  Lines          36092     37608     +1516     
  Branches        5982      6107      +125     
===============================================
+ Hits           25093     26254     +1161     
- Misses          8629      8904      +275     
- Partials        2370      2450       +80     
Impacted Files Coverage Δ
...ta/codecs/reads/cram/cramV2_1/CRAMEncoderV2_1.java 0.000% <0.000%> (ø)
...ta/codecs/variants/vcf/vcfv3_2/VCFEncoderV3_2.java 0.000% <0.000%> (ø)
...ta/codecs/variants/vcf/vcfv3_3/VCFEncoderV3_3.java 0.000% <0.000%> (ø)
...ta/codecs/variants/vcf/vcfv4_0/VCFEncoderV4_0.java 0.000% <0.000%> (ø)
...ta/codecs/variants/vcf/vcfv4_1/VCFEncoderV4_1.java 0.000% <0.000%> (ø)
.../java/htsjdk/beta/exception/HtsjdkIOException.java 0.000% <0.000%> (ø)
src/main/java/htsjdk/beta/io/IOPathUtils.java 59.091% <0.000%> (+2.569%) :arrow_up:
...main/java/htsjdk/beta/io/bundle/BundleBuilder.java 72.222% <ø> (ø)
...tsjdk/beta/plugin/hapref/HapRefDecoderOptions.java 0.000% <0.000%> (ø)
...dk/beta/plugin/hapref/HaploidReferenceFormats.java 0.000% <0.000%> (ø)
... and 198 more