samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
285 stars 243 forks source link

AsynchronousLineReader closes a stream on EOF leading to blowup #579

Closed akiezun closed 8 years ago

akiezun commented 8 years ago

htsjdk 2.2.1

This tests blows up because the worker thread closes the stream when it encounters EOF. As discussed in https://github.com/samtools/htsjdk/pull/577 the AsynchronousLineReader may be broken beyond repair. Thanks to @d-cameron for suggesting this testcase.

This bug led to errors like this: https://github.com/broadinstitute/gatk/issues/1638 and forced us to abandon asyncIO for tribble for now.

    @Test
    public void testEOFShouldNotCloseStream() throws Exception {
        final File filePath = new File(TestUtils.DATA_DIR + "trio.vcf");//short file
        final InputStreamReader is = new InputStreamReader(new FileInputStream(filePath));
        new AsynchronousLineReader(is);
        Thread.sleep(2000L); //give the worker a while to reach the end of file
        is.read(); //this should not blow up
    }

blows up like this

java.io.IOException: Stream closed

    at sun.nio.cs.StreamDecoder.ensureOpen(StreamDecoder.java:46)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:148)
    at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
    at java.io.InputStreamReader.read(InputStreamReader.java:168)
    at htsjdk.tribble.readers.AsynchronousLineReaderTest.testEOFShouldNotCloseStream(AsynchronousLineReaderTest.java:41)
droazen commented 8 years ago

@akiezun Should we disable the ability to use tribble async I/O at all in htsjdk until these issues are resolved (or just remove AsynchronousLineReader completely)?

akiezun commented 8 years ago

i dont' know about async writing - maybe it's fine. But reading is broken - I am suggesting either turning it off or just deleting the class completely so that htsjdk clients are not using bogus code.

akiezun commented 8 years ago

See also https://github.com/samtools/htsjdk/pull/528 where I proposed disabling asyncIO reading from tribble. We took a different path that time. Nonetheless I suggest disabling it or removing it because it is broken. Can we have a "vote"? @droazen @tfenne @d-cameron (anyone else who has an opinion, please chime in)

d-cameron commented 8 years ago

My vote is for removal. The desired performance improvement could be achieved in a more generic manner with an AsyncBufferedIterator.

I'd also like TribbleIndexedFeatureReader.query() to be updated to explicitly close the previous iterator every time it returns a new one. Incidentally, why are there LineReader interfaces defined in both htsjdk.samtools.util and htsjdk.tribble.readers, and why doesn't LineReader implement CloseableIterator?

droazen commented 8 years ago

:+1: for removal, provided there are no objections.

tfenne commented 8 years ago

:+1: on removing.