samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
278 stars 244 forks source link

Add support for reserializing CRAM containers #1606

Open cmnbroad opened 2 years ago

cmnbroad commented 2 years ago

There are cases where we want to be able to re-serialize CRAM containers into a new stream without fully decoding and then re-encoding them (and without the need for the reference). For example, if we want to stitch together CRAM shards that were created in parallel (i.e, as Disq does), or if want to split a CRAM up into smaller CRAMs while preserving existing container boundaries. The main thing required is a means to update the stream-relative values that are in the container and slice headers, specifically the global record counter.