samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
284 stars 243 forks source link

Errors inflating Intel deflated bams. #523

Closed jacarey closed 8 years ago

jacarey commented 8 years ago

Bams generated after https://github.com/samtools/htsjdk/pull/515 are causing issues when we try to inflate them.

@mishalinaik @akiezun

[Thu Mar 17 11:22:51 EDT 2016] picard.sam.ValidateSamFile INPUT=test3.bam MODE=VERBOSE MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Thu Mar 17 11:22:51 EDT 2016] Executing as jcarey@jcarey-OptiPlex-7010 on Linux 4.2.0-34-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17; Picard version: 1eccaab927be982f02d61cc3bc5ad2b9617a4e9f JdkDeflater [Thu Mar 17 11:22:52 EDT 2016] picard.sam.ValidateSamFile done. Elapsed time: 0.02 minutes. Runtime.totalMemory()=253231104 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: java.util.zip.DataFormatException: invalid block type at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:112) at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:410) at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:392) at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:127) at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:260) at htsjdk.samtools.SamStreams.readBytes(SamStreams.java:24) at htsjdk.samtools.SamStreams.isBAMFile(SamStreams.java:58) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:255) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:134) at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:154) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at edu.mit.broad.picard.cmdline.PicardPrivateCommandLine.main(PicardPrivateCommandLine.java:24) Caused by: java.util.zip.DataFormatException: invalid block type at java.util.zip.Inflater.inflateBytes(Native Method) at java.util.zip.Inflater.inflate(Inflater.java:259) at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96) ... 12 more

akiezun commented 8 years ago

oops. @mishalinaik can you fix this ASAP and add a test to prevent this error in the future?

akiezun commented 8 years ago

really, it's my bad for not testing this before merging. I'll take on writing tests ;)

mishalinaik commented 8 years ago

@jacarey @akiezun I will try to reproduce the issue

jacarey commented 8 years ago

Thanks. I can provide a truncated bam if that helps. You can see the issue in the first block. Block header seems fine but the native inflater is choking on it.

We are using the old libIntelDeflater.so for now but we have about 2 days of production bams that are current broken and we would like to evaluate the chances of recovering them without have to reanalyze.

jacarey commented 8 years ago

Not sure if it helps but just running a verbose test with gunzip I get:

gunzip -tv < /pod/pod2temp/pod2/tmp/HL3YGCCXX/C1-318_2016-03-09_2016-03-12/1/NexPond-504973/HL3YGCCXX.1.aligned.bam gzip: stdin: extra field of 6 bytes ignored

gzip: stdin: invalid compressed data--format violated

mishalinaik commented 8 years ago

@jacarey Thanks I was able to reproduce the bug, working on it now. Will let you know.

mishalinaik commented 8 years ago

@jacarey @akiezun I found the issue it was with igzip build, I have rebuilt IntelDeflater. I have pushed the fixed libIntelDeflater.so to https://github.com/Intel-HSS/htsjdk/tree/igzip_bam_integration

jacarey commented 8 years ago

@mishalinaik is there any chance of recovering the bams we created using this. We currently have about a full day and half of production data that has unusable bams.

droazen commented 8 years ago

@jacarey @ktibbett Is it finally time to consider building for production off of a stable branch rather than HEAD, to guard against regressions like this that crop up during development?

jacarey commented 8 years ago

@droazen @akiezun I think that without tests there is a very good chance this would have made it in to a stable branch as well. In addition to the fact that knowing the current state of things (we use master as our production stable branch) changes like these should be properly vetted.

I spent the morning tracking this down and once I found the issue was able to test in by simply trying to gunzip outputs produced by this change.

This should not have made it in to master and arguing about whether or not we should be using an additional stable branch for our production releases is not particularly relevant to this case.

lbergelson commented 8 years ago

Does this happen on every bam that's compressed with the new inflator? Or is it only some specific rare case? I don't understand how this wasn't caught by the htsjdk test suite, which surely writes/reads bams? Or are there no tests that actually use the intel deflator at all in htsjdk?

jacarey commented 8 years ago

@lbergelson It is actually the deflater that has the issue. If the libIntelDeflater.so file is found in the bin dir with the jar htsjdk clps attempt to use it.

In this case the blocks are not valid gzip blocks and so the java inflater can't decompress them.

This is for every bam that has been generated for the last day and half. I'm guessing that all the tests use the standard JDKDeflater and so never picked up on the issue.

lbergelson commented 8 years ago

@jacarey I understand it's the deflator that has the issue. I was just trying to figure out where the hole in our tests is. It must be that the deflator is never used in the htsjdk tests, which is a sad state of affairs.
Do the picard tests pass?

droazen commented 8 years ago

@jacarey We all agree that this should have been tested before merge, but given the size of this project and the sheer number of contributors, mistakes like this will occasionally sneak in. The frequency of these breakages suggests that master is not really usable as a stable branch. With a separate branch isolated from bleeding-edge development you could at least run a heavy-duty set of stress tests to catch major regressions like this before deciding to move to a new version.

ktibbett commented 8 years ago

@lbergelson , yes, the picard tests passing are a condition of doing a release internally.

@droazen, we can discuss a better solution (and the resources to implement it) shortly.

@mishalinaik, urgently: is there any solution that would let us recover these BAMs? Even a one-off script for re-writing them would be much, much better than re-analyzing from scratch.

lbergelson commented 8 years ago

Does that mean that there isn't a single test in either picard or htsjdk that writes a bam and then reads it again? Or do the tests run without the inflator for some reason?

jacarey commented 8 years ago

@lbergelson Not using the Intel Deflater no.

jacarey commented 8 years ago

@akiezun Clearly the lack of testing here is an oversight. I think in part it has to do with the awkward way that the DeflaterFactory works.

I still maintain that because this change affects the output of ALL bams that use the deflater that we should have been absolutely sure the files being written were viable before merging the change.

akiezun commented 8 years ago

@jacarey I do accept my part of the responsibility here - I should have tested it more and will make new tests once we have a fix. But the system is clearly imperfect and we should work together on making it more robust.

mishalinaik commented 8 years ago

I have made the pull request with the fix and a test.

akiezun commented 8 years ago

Is this now done? Can we close?

lbergelson commented 8 years ago

I'm closing this since the intel deflater is no longer part of htjsdk.