samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

add predicate to GFF3Codec to give a chance to filter out some unused attributes #1575

Closed lindenb closed 2 years ago

lindenb commented 2 years ago

Description

When using a GFF3Reader with DecodeDepth==DEEP, it may use a large amount of memory with attributes that will never be used ("version" ,"tag", etc...). This PR gives the GFF3Codec a chance to set a Predicate<String> to only keep a defined set of attributes.

the private attribute of ID_ATTRIBUTE_KEY and NAME_ATTRIBUTE_KEY Gff3BaseData was removed to check if the predicate does not remove them.

a new method setFilterOutAttribute was added to GFF3Codec

the static attribute of GFF3Codec.parseLine was removed

I added a test codecFilterOutFieldsTest

Things to think about before submitting:

codecov-commenter commented 2 years ago

Codecov Report

Merging #1575 (646da19) into master (57c3f03) will decrease coverage by 0.037%. The diff coverage is 81.818%.

@@               Coverage Diff               @@
##              master     #1575       +/-   ##
===============================================
- Coverage     69.841%   69.804%   -0.037%     
- Complexity      9633      9639        +6     
===============================================
  Files            702       702               
  Lines          37611     37618        +7     
  Branches        6108      6088       -20     
===============================================
- Hits           26268     26259        -9     
- Misses          8897      8907       +10     
- Partials        2446      2452        +6     
Impacted Files Coverage Δ
...rc/main/java/htsjdk/tribble/gff/Gff3Constants.java 0.000% <ø> (ø)
src/main/java/htsjdk/tribble/gff/Gff3Codec.java 77.083% <75.000%> (-0.170%) :arrow_down:
.../htsjdk/variant/variantcontext/VariantContext.java 78.456% <81.818%> (+0.132%) :arrow_up:
src/main/java/htsjdk/tribble/gff/Gff3BaseData.java 80.556% <100.000%> (ø)
...sjdk/samtools/util/htsget/HtsgetErrorResponse.java 73.333% <0.000%> (-6.667%) :arrow_down:
...tools/cram/encoding/core/GammaIntegerEncoding.java 86.667% <0.000%> (-6.667%) :arrow_down:
...mtools/cram/encoding/core/BetaIntegerEncoding.java 91.304% <0.000%> (-4.348%) :arrow_down:
...am/encoding/core/CanonicalHuffmanByteEncoding.java 52.941% <0.000%> (-2.941%) :arrow_down:
...va/htsjdk/beta/codecs/variants/vcf/VCFDecoder.java 62.651% <0.000%> (-2.410%) :arrow_down:
...va/htsjdk/samtools/util/htsget/HtsgetResponse.java 89.706% <0.000%> (-1.471%) :arrow_down:
... and 17 more
lindenb commented 2 years ago

@lbergelson thank you for your review. I moved the static final String from Gff3BaseData to Gff3Constants, and replaced setFilterOutAttribute by a new constructor.