samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

CRAM: Interaction between APDelta and Multi-Reference Slices #1218

Open jmthibault79 opened 5 years ago

jmthibault79 commented 5 years ago

Summary of a discussion with a note for future investigation:

Container Compression Headers have their APDelta flag set when the file is Coordinate Sorted, which affects the behavior of the alignmentStart / alignmentDelta fields in the Container's CramCompressionRecords. Suppose we had a Multi-Reference Slice in a Container with APDelta = true.

This came up when I broke a test on a private branch by assuming that a Multi-Ref context automatically implies APDelta = false. Removing that assumption fixed the test.

CramContainerStreamWriter looks like a good start for this investigation, particularly DEFAULT_SLICES_PER_CONTAINER and DEFAULT_RECORDS_PER_SLICE

jmthibault79 commented 5 years ago

Note: the spec requires Multi-Ref to NOT be delta-encoded (section 10.2), so we should enforce this. Also note that we no longer store alignmentDelta after #1304