Closed tomwhite closed 5 years ago
Merging #1396 into master will increase coverage by
0.019%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #1396 +/- ##
==============================================
+ Coverage 68.031% 68.05% +0.019%
- Complexity 8368 8372 +4
==============================================
Files 573 573
Lines 33886 33890 +4
Branches 5662 5663 +1
==============================================
+ Hits 23053 23062 +9
+ Misses 8643 8640 -3
+ Partials 2190 2188 -2
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
src/main/java/htsjdk/samtools/CSIIndex.java | 71.622% <ø> (ø) |
59 <0> (ø) |
:arrow_down: |
...ain/java/htsjdk/samtools/AbstractBAMFileIndex.java | 85.119% <100%> (+0.363%) |
50 <0> (+2) |
:arrow_up: |
.../htsjdk/variant/variantcontext/VariantContext.java | 77.714% <0%> (ø) |
246% <0%> (ø) |
:arrow_down: |
...samtools/util/AsyncBlockCompressedInputStream.java | 76% <0%> (+4%) |
13% <0%> (+1%) |
:arrow_up: |
...htsjdk/samtools/util/nio/DeleteOnExitPathHook.java | 89.474% <0%> (+10.526%) |
4% <0%> (+1%) |
:arrow_up: |
I've fixed this now. I wasn't sure how to add a direct test. Any suggestions, or is the existing coverage sufficient?
@tomwhite I wrote a test here: https://github.com/samtools/htsjdk/tree/skip-to-sequence-optimization
Closing since I merged #1397 which includes this.
…reading sequences in order.
Description
Reading index sequences in the order they are defined in the header is a very common access pattern. However, the way it is implemented in
AbstractBAMFileIndex
means that it has quadratic complexity, since when reading sequence i inskipToSequence
that is not in the cache, it will start from the beginning, even if sequence i - 1 is in the cache.The effect is particularly stark when there are many reference sequences, such as hg38 which has over 3000. See https://github.com/disq-bio/disq/pull/109 for an example.
The fix is simple - look to see if sequence i - 1 is in the cache, and start from that position if it is. The change is covered by existing tests.
/cc @lbergelson