jkbonfield / io_lib

Staden Package "io_lib" (sometimes referred to as libstaden-read by distributions). This contains code for reading and writing a variety of Bioinformatics / DNA Sequence formats.
Other
36 stars 15 forks source link

Inteface for CRAM encoding mode #26

Closed gt1 closed 4 years ago

gt1 commented 4 years ago

The scramble program allows setting a CRAM compression mode (via the -X switch). At the moment this is transformed into various settings in progs/scramble.c . It would be nice if these would also be available as some unified API interface (i.e. call a function on a cram_fd which will subsequently be in e.g. archive compression mode).

jkbonfield commented 4 years ago

Good suggestion. That's more or less what I did in my htslib PR here: https://github.com/samtools/htslib/blob/a22a0af30eb46bd0422bf6313cdd2ae424e19022/hts.c#L766

It uses the same mechanism already existance for seqs_per_slice, slices_per_container, embed_ref, etc to add fast, normal, small and archive. I thought about "profile=X" type syntax, which may be cleaner, but for command line arguments it was a bit friendlier just to lift it up one level.

Any preferences? I think for io_lib given scramble already has the -X option, friendly doesn't matter, so adding a CRAM_OPT_PROFILE case to cram_set_voption seems the most logical.

gt1 commented 4 years ago

I do not have a specific preference for how it is done. I just thought it would be a good thing to have a defined interface for it. Extracting the necessary options from scramble.c just did not seem very elegant, especially so because the settings performed for e.g. archive mode may change over time.

jkbonfield commented 4 years ago

Thanks German. I found and fixed a couple bugs in the process. Much tidier now.

Syntax is eg scram_set_option(out_fd, CRAM_OPT_PROFILE, "small")

gt1 commented 4 years ago

Thank you for the interface James, much appreciated.