Closed jkbonfield closed 2 years ago
@jkbonfield I think this is the same underlying issue as #721, which is I always thought of as "htsjdk doesn't support embedded reference". But now I'm wondering, is "RR=0" the same thing as "reference embedded in a block", or is it something else ("no embedded reference, reads are stored with the full sequence and not reference-compressed") ? Not sure I can tell this from the spec, at least in the past - is this described there ?
RR is "reference required". It is set to zero either for embedded reference or for referenceless mode, which was my use case above.
It may turn out to be the same underlying issue though. If so, then feel free to close this with a link to the other ticket.
This was fixed as part of the 2020 refactoring branch, and is covered by tests in CRAMReferencelessTest
class, among others.
Verify
This is related to #721, but not quite the same bug.
That was failing due to MD5 mismatches in the slice. Adding a "R=$HUMAN_REF" option to the picard line turns this error into #721 instead. However the reference is not required and should not be needed to be specified. It's not for SAM reading, so it isn't a globally required option. It should not be required for all CRAM files either.
Subject of the issue
Reading a CRAM file that was produced using "scramble -x" (likely samtools view --output-fmt-option no_ref would do the same thing) cannot be read by picard. I haven't written my own code against htsjdk, but I'm assuming this is a library thing rather than an application issue. The error is:
See attached file: a.zip
This file has no M5 tags, as they are not required when using non-ref mode. The CRAM file has the RR field set to 0 in the container header. From
cram_dump
:Your environment
Steps to reproduce
My comand line is:
Expected behaviour
The expected output is that it works, comparable to samtools view or scramble a.cram a.sam.
Actual behaviour
Failure!