samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Fixed issue #1111 in broadinstitute/picard #1322

Closed Sonik-zirael closed 5 years ago

Sonik-zirael commented 5 years ago

Description

Here you can see our fix for issue https://github.com/broadinstitute/picard/issues/1111. We decided to create an index file if it does not exist. We wrote several tests in htsjdk to check if index file was created correctly. But now there is one test (testIsQueryable in VCFFileReaderTest) which has not been passed successfully. This test is for hasIndex() (Boolean) function. And now this function always returns true. Maybe it would be better to rewrite this test.

Checklist

codecov-io commented 5 years ago

Codecov Report

Merging #1322 into master will increase coverage by 0.017%. The diff coverage is 33.333%.

@@               Coverage Diff               @@
##              master     #1322       +/-   ##
===============================================
+ Coverage     67.626%   67.644%   +0.017%     
  Complexity      8229      8229               
===============================================
  Files            562       562               
  Lines          33617     33638       +21     
  Branches        5642      5644        +2     
===============================================
+ Hits           22734     22754       +20     
+ Misses          8708      8705        -3     
- Partials        2175      2179        +4
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/util/IOUtil.java 58.894% <0%> (+0.783%) 119 <1> (+1) :arrow_up:
...va/htsjdk/tribble/TribbleIndexedFeatureReader.java 65.5% <38.889%> (-3.731%) 23 <0> (-2)
...c/main/java/htsjdk/tribble/index/IndexFactory.java 78.065% <0%> (+2.581%) 27% <0%> (ø) :arrow_down:
...samtools/util/AsyncBlockCompressedInputStream.java 76% <0%> (+4%) 13% <0%> (+1%) :arrow_up:
src/main/java/htsjdk/tribble/TribbleException.java 66.667% <0%> (+7.692%) 5% <0%> (ø) :arrow_down:
lbergelson commented 5 years ago

@Sonik-zirael Hello and thank you for your interest and work in creating this PR. I especially appreciate that you've added tests!

I'm not sure that it's the right solution for this problem though. The decision to not automatically generate the index in htsjdk when loading files was a deliberate one. Usually an index is used to provide fast access to a file. Regenerating it incurs the cost of reading the entire file which negates the point of having the index. So if htsjdk is told to load a file with an index we deliberately fail when it's missing to alert the user that there is a problem. It's the responsibility of downstream tool authors to decide when they require the index and when they don't. In this case it seems like picard is accidentally requiring an index when none is necessary, and we can fix that much more easily in picard.

I think that this line in picard is the problem:

final VCFFileReader in = new VCFFileReader(INPUT);

changing it to

final VCFFileReader in = new VCFFileReader(INPUT, false);

Should allow the tool to run without an index.

lbergelson commented 5 years ago

I'm going to close this since I believe this issue should be fixed in picard instead. Feel to reopen if you have further comments.