samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
681 stars 240 forks source link

Incorrect compression on `view -Ou -o foo.bcf` #2263

Closed jkbonfield closed 2 months ago

jkbonfield commented 3 months ago

https://github.com/samtools/bcftools/commit/30dbc3c39ed92d3617fb0737ba533b1c97c3907e added code to override the requested format based on the filename suffix. Unfortunately -Ob and -Ou now both operate the same as it claims bcf is FT_BCF | FT_GZ.

Hence this regressed between 1.11 and 1.12.

We can still keep the autodetection, but we also need to explicitly override clevel for -u so it's subsequently defined as level 0. Eg:

diff --git a/vcfview.c b/vcfview.c
index 58063ebb..63ce5438 100644
--- a/vcfview.c
+++ b/vcfview.c
@@ -629,9 +629,9 @@ int main_vcfview(int argc, char *argv[])
             case 'O':
                 switch (optarg[0]) {
                     case 'b': args->output_type = FT_BCF_GZ; break;
-                    case 'u': args->output_type = FT_BCF; break;
+                    case 'u': args->output_type = FT_BCF; args->clevel = 0;break;
                     case 'z': args->output_type = FT_VCF_GZ; break;
-                    case 'v': args->output_type = FT_VCF; break;
+                    case 'v': args->output_type = FT_VCF; args->clevel = 0;break;
                     default:
                     {
                         args->clevel = strtol(optarg,&tmp,10);

Note this is still subtly different. 1.11 onwards treated -Ou as a naked BCF with no gzip wrapper. -Ob -l0 however is BGZF wrapped with no-compression, but it does mean it has CRC checking. I think the former was not spec compliant anyway as BGZF is mandatory, so this regression could be considered to be a bug fix.

jkbonfield commented 3 months ago

The above fixes it for view, but the issue exists with most commands many of which do now have a compression level option. Eg bcftools sort.

I have a PR incoming, but I think the correct solution is fixing set_wmode so it honours what is passed into it if it's compatible with the extension, and sets FT_BCF|FT_GZ otherwise.