samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Extend SamLocusAndReferenceIterator to work with CRAMFileReader #1483

Closed jkaczynski1 closed 3 years ago

jkaczynski1 commented 4 years ago

Description

Extend the SamLocusAndReferenceIterator class to work with the CRAMFileReader and other file readers. Currently the SamLocusAndReferenceIterator class depends on the legacy SamReader class. The purpose of the enhancement is to add support for newer file reader classes such as the CRAMFileReader.

Things to think about before submitting:

codecov-commenter commented 4 years ago

Codecov Report

Merging #1483 into master will decrease coverage by 0.116%. The diff coverage is 55.725%.

@@               Coverage Diff               @@
##              master     #1483       +/-   ##
===============================================
- Coverage     69.216%   69.100%   -0.116%     
- Complexity      8704      8765       +61     
===============================================
  Files            588       591        +3     
  Lines          34583     34845      +262     
  Branches        5780      5845       +65     
===============================================
+ Hits           23937     24078      +141     
- Misses          8366      8443       +77     
- Partials        2280      2324       +44     
Impacted Files Coverage Δ Complexity Δ
...amtools/util/SamRecordIntervalIteratorFactory.java 41.379% <0.000%> (-14.435%) 2.000 <0.000> (ø)
...samtools/util/AbstractFileReaderLocusIterator.java 51.955% <51.955%> (ø) 43.000 <43.000> (?)
...reference/FileReaderLocusAndReferenceIterator.java 76.471% <76.471%> (ø) 4.000 <4.000> (?)
.../htsjdk/samtools/util/FileReaderLocusIterator.java 78.431% <78.431%> (ø) 16.000 <16.000> (?)
...htsjdk/samtools/util/nio/DeleteOnExitPathHook.java 80.952% <0.000%> (-9.524%) 3.000% <0.000%> (-1.000%)
...samtools/util/AsyncBlockCompressedInputStream.java 72.000% <0.000%> (-4.000%) 12.000% <0.000%> (-1.000%)
lbergelson commented 4 years ago

@jkaczynski1 This seems like a great idea for improvement. Thanks for doing this! Someone will have to do a review of it before we can get it merged. We'll try to get to it quickly but I have very limited bandwidth at the moment.

jkaczynski1 commented 4 years ago

@lbergelson I am glad I can help. If there is anything I can do to speed up the review please let me know.

lbergelson commented 4 years ago

@yfarjoun Any chance you could review this one or find someone on who would be appropriate for it? I haven't gotten a chance and I don't want it to sit forever.

yfarjoun commented 3 years ago

I have comments about the code (in particular it seems like a straight-up copy of code that already exists in htsjdk), but I'm having a hard time figuring out what the problem is....CollectWgsMetrics in Picard works with a SamLocusIterator and we run it against cram files all the time. In particular, there's a test there (picard/analysis/CollectWgsMetricsTest.java:145) that you can change the suffixes to "cram" (you'll also have to change picard/vcf/VcfTestUtils.java:49 to enable it to create ".crai" indexes) and if you run that tests suite you'll see that it works against the generated cram file.

Perhaps we can start with an issue describing the problem you are seeing? I'm sorry it took so long to get to this, part of the problem is that this PR seemed like a lot of code to review and it wasn't clear what it does or why it's needed...

I hope you are still around to discuss the matter.

Cheers.

yfarjoun commented 3 years ago

👎

jkaczynski1 commented 3 years ago

@yfarjoun Thank you for reviewing my PR. I have run the test you referred to with the changes you suggested and I got the following warning: SAMFileWriterFactory Unknown file extension, assuming BAM format when writing file I have checked the generated temporary file and it is a BAM file with a .cram extension. The purpose of my changes is visible in the test class FileReaderLocusAndReferenceIteratorTest I provided which is similar to SamLocusAndReferenceIteratorTest but it works with CRAM files. I want to use SamLocusAndReferenceIterator and SamLocusIterator with CRAM files. If the changes I submitted are unnecessary it should be possible to execute tests from SamLocusAndReferenceIteratorTest on a CRAM file without any new code. Perhaps there is a more obvious example how to do it.

yfarjoun commented 3 years ago

@jkaczynski1 I hope that the PR I provided shows a good way to create a SamReader from a cram (or any other) file. I'm sorry that you were led down a rabbit-hole. This is probably due to the lack of documentation in htsjdk....I'm not entirely sure how to solve this problem, but if you have ideas, we are very open to suggestions.

jkaczynski1 commented 3 years ago

@yfarjoun Thanks for sharing the example - it helps a lot. Expanding unit tests is a good way of communicating how to use the library.