samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
276 stars 244 forks source link

CRAM: 8-Bin Base Quality Compression #1658

Open JWollnik opened 1 year ago

JWollnik commented 1 year ago

Description of the issue:

Is it possible to use the 8-Bin Base Quality Compression with the new overhauled CRAM support in 3.0.5?

442 indicates that this would be possible with CRAMEncodingStrategy. However, it's not really clear from the API or release notes how to achieve this. In the past (e.g. 2.18.1) it has been possible to use new QualityScorePreservation("*8") to achieve such a compression.

Your environment:

cmnbroad commented 1 year ago

@JWollnik This isn't supported by htsjdk CRAM. The CRAMEncodingStrategy class is intended to be used to support such encoding parameters, but it currently doesn't support score binning, and is not exposed or documented (at least not as part of the SMFileWriter API). Its only used internally for creating test case variations in tests.

As for the old QualityScorePreservation class, the original htsjdk CRAM implementation accepted it as a parameter, but it was neither documented nor implemented. It really should never have been allowed into htsjdk in that state, so it was removed.