samtools / hts-specs

Specifications of SAM/BAM and related high-throughput sequencing file formats
http://samtools.github.io/hts-specs/
644 stars 174 forks source link

Revert defn of CRAM container size to be sum of block sizes (PR#731) #731

Closed jkbonfield closed 1 month ago

jkbonfield commented 1 year ago

This was added as clarification in #398 after discussion in #396, but this was in error. In our attempts to clarify and nail down these corner cases, we failed to recall that the SAM header is permitted to be padded out by non-block allocated space.

History on this decision dates back to 2013 and is show in Samtools issue samtools/samtools#1852.

There are good reasons for changing away from the decision of padding via a second block, as changing block sizes can also change block structure size (if we're using a generic shared piece of code, due to ITF8 being a variable length integer), and this in turn makes it cumbersome to handle every possible change in SAM header size. It is far easier and simpler to just have unallocated space after the block and before the end of the container. This is how htslib works since CRAM 3.0 and I believe how CRAMtools.jar works.

Fixes samtools/samtools#1852.

github-actions[bot] commented 1 year ago

Changed PDFs as of 99637184631e9a22bba2472834733e587c053567: CRAMv3 (diff).

jkbonfield commented 1 year ago

TODO: Does this apply only to header container, or all containers?

jkbonfield commented 1 year ago

Minimal update made to explicitly state the additional padding bytes are for the CRAM header container only. If we find a compelling reason to later we can always relax this limitation while keeping backwards compaibility.

github-actions[bot] commented 1 year ago

Changed PDFs as of 8e750cb3e37bc43e5dc7949ba42a7a31d0a37323: CRAMv3 (diff).