samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Add an IntervalCodec that use useful for sorting large sets of #1288

Closed nh13 closed 5 years ago

nh13 commented 5 years ago

intervals.

Description

This will be useful for using with a SortingCollection when we are sorting over MANY intervals. My intention is to submit a PR to Picard to use this codec and SortingCollection when lifting over an interval list, as I sometimes run out of memory with LiftoverIntervalList even with 32GB of memory.

Checklist

codecov-io commented 5 years ago

Codecov Report

Merging #1288 into master will increase coverage by 0.092%. The diff coverage is 70.33%.

@@               Coverage Diff               @@
##              master     #1288       +/-   ##
===============================================
+ Coverage     67.495%   67.587%   +0.092%     
- Complexity      8150      8178       +28     
===============================================
  Files            558       561        +3     
  Lines          33364     33409       +45     
  Branches        5608      5611        +3     
===============================================
+ Hits           22519     22580       +61     
+ Misses          8657      8641       -16     
  Partials        2188      2188
Impacted Files Coverage Δ Complexity Δ
...c/main/java/htsjdk/samtools/util/IntervalList.java 73.754% <0%> (+5.318%) 74 <0> (+3) :arrow_up:
src/main/java/htsjdk/samtools/BAMRecordCodec.java 79.839% <100%> (ø) 25 <0> (ø) :arrow_down:
...dk/samtools/util/IntervalCoordinateComparator.java 58.333% <58.333%> (ø) 6 <6> (?)
.../main/java/htsjdk/samtools/util/IntervalCodec.java 79.412% <79.412%> (ø) 8 <8> (?)
.../java/htsjdk/samtools/util/IntervalListWriter.java 88% <88%> (ø) 5 <5> (?)
...rc/main/java/htsjdk/samtools/util/BinaryCodec.java 69.725% <0%> (+0.459%) 57% <0%> (+1%) :arrow_up:
src/main/java/htsjdk/samtools/util/IOUtil.java 57.653% <0%> (+0.51%) 117% <0%> (+2%) :arrow_up:
... and 3 more
nh13 commented 5 years ago

I may need to also make a IntervalListWriter so we don't need the whole interval list in memory... Sigh...

nh13 commented 5 years ago

@lbergelson commit bd872b5 is needed for this PR to be fully useful for the PR in Picard. Thanks for the review!