samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Fix CRAMFileReader stream management #1533

Closed cmnbroad closed 3 years ago

cmnbroad commented 3 years ago

CRAMIterator (which is a closeable iterator) closes it's underlying stream when it's closed (BAMIterator doesn't do this; it relies on the enclosing reader to do it). In the case where the underlying stream was supplied to the CRAMFileReader constructor directly, rather than through a File, this causes serial index queries to fail after the first query, since the CRAMFileReader constructors are typed as InputStream, and a SeekableStream can't be reconstituted from an InputStream. These stream constructors are the ones used by SamReaderFactory when an input is a non-local file, ie., when it's provided through a SamPathInputResource.

Changing CRAMIterator to not close the stream on close() will leak result in leaking streams that are created via other CRAMFileReader constructors. The ultimate issue here is that CRAMFileReader doesn't have a constructor that accepts a SeekableStream directly for the input source, only the index (??). Since rationalizing the constructors is a bigger change that will require deprecating existing constructors, and affect lots of code and tests, for now this PR uses a SeekableStream wrapper class DeferredCloseSeekableStream to that is used to create iterators, and defers closing the underlying stream until the reader is closed.

Fixes an issue reported in GATK https://github.com/broadinstitute/gatk/issues/6475.

cmnbroad commented 3 years ago

@lbergelson No, I don't think there is any such protection for iterators. I believe there is a ticket for that - one of the many things that need to be cleaned up in this file.