samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

CRAM 2.1 test files have incorrect container block byte sizes #1452

Open cmnbroad opened 4 years ago

cmnbroad commented 4 years ago

The version 2.1 test files in the repo appear to have SAMFileHeader containers that have a container block size that is (either 2 or 4 bytes) shorter than the size of the actual embedded block containing the header. To compensate, the code ta reads the header container has a 2.1 code path that relies on the block size value instead of the container size value.

Its unclear where these files came from (they were added a part of the initial CRAM commit), or what CRAM implementation/version created them, but the spec historically did under-specify the structure and contents of the special (header and eof) containers. See https://github.com/samtools/hts-specs/issues/450.

Ultimately, these files, and the corresponding special code path should be removed from the repo.