PR #1233 fixes a bug in HTSJDK 2.18.0 which was observed when running GATK on an input CRAM with unmapped reads. We were able to narrow down the issue to the unmapped reads by observing that the problem still occurred by specifying -L UNMAPPED.
Add a test file to HTSJDK with these characteristics:
small
publically-shareable
demonstrates the problem fixed by #1233
I have a large CRAM which demonstrates the problem but I have been unable to reduce it to its unmapped reads. This should be trivial, so we should also investigate why both of these attempts failed:
Use a version of GATK (with the fix) to run PrintReads on a CRAM with -L UNMAPPED and produce CRAM output. Run GATK PrintReads on the result CRAM with -L UNMAPPED. Surprisingly, no reads are found. When re-running without -L UNMAPPED, all reads are seen. The Current Locus for these reads displays as "unmapped" in the ProgressMeter.
Run samtools view -@4 -f 4 -C on a CRAM to produce CRAM output consisting of only those reads which have bit flag 0x04 set (unmapped). Run GATK PrintReads on the result CRAM. Unfortunately, this crashes with a stack trace.
java.lang.ArrayIndexOutOfBoundsException
at sun.security.provider.DigestBase.engineUpdate(DigestBase.java:105)
at java.security.MessageDigest$Delegate.engineUpdate(MessageDigest.java:584)
at java.security.MessageDigest.update(MessageDigest.java:325)
at htsjdk.samtools.util.SequenceUtil.calculateMD5(SequenceUtil.java:917)
at htsjdk.samtools.cram.structure.Slice.setRefMD5(Slice.java:162)
at htsjdk.samtools.CRAMContainerStreamWriter.flushContainer(CRAMContainerStreamWriter.java:455)
at htsjdk.samtools.CRAMContainerStreamWriter.finish(CRAMContainerStreamWriter.java:128)
at htsjdk.samtools.CRAMFileWriter.finish(CRAMFileWriter.java:129)
at htsjdk.samtools.SAMFileWriterImpl.close(SAMFileWriterImpl.java:215)
at org.broadinstitute.hellbender.utils.read.SAMFileGATKReadWriter.close(SAMFileGATKReadWriter.java:26)
at org.broadinstitute.hellbender.tools.PrintReads.closeTool(PrintReads.java:107)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:970)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Subject of the issue
PR #1233 fixes a bug in HTSJDK 2.18.0 which was observed when running GATK on an input CRAM with unmapped reads. We were able to narrow down the issue to the unmapped reads by observing that the problem still occurred by specifying
-L UNMAPPED
.Add a test file to HTSJDK with these characteristics:
I have a large CRAM which demonstrates the problem but I have been unable to reduce it to its unmapped reads. This should be trivial, so we should also investigate why both of these attempts failed:
Use a version of GATK (with the fix) to run PrintReads on a CRAM with
-L UNMAPPED
and produce CRAM output. Run GATK PrintReads on the result CRAM with-L UNMAPPED
. Surprisingly, no reads are found. When re-running without-L UNMAPPED
, all reads are seen. The Current Locus for these reads displays as "unmapped" in the ProgressMeter.Run
samtools view -@4 -f 4 -C
on a CRAM to produce CRAM output consisting of only those reads which have bit flag 0x04 set (unmapped). Run GATK PrintReads on the result CRAM. Unfortunately, this crashes with a stack trace.