samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
276 stars 244 forks source link

reducing memory required in IntervalMergerIterator when not concatting names #1711

Closed meganshand closed 2 weeks ago

meganshand commented 3 weeks ago

When using IntervalMergerIterator to merge many small intervals over a whole contig it can take quite a bit of memory to store each interval. This can happen when trying to convert a GVCF with many reference blocks that are only a few bases in length into an interval list. The intervals are stored only to concat the names of each interval, which is not useful when merging so many intervals. This change keeps memory usage lower by not adding each interval that is iterated over to toBeMerged when concatenateNames is false.

meganshand commented 3 weeks ago

@lbergelson can you please take a look when you have a moment?