samtools / htslib

C library for high-throughput sequencing data formats
Other
789 stars 447 forks source link

Fix a couple small VCF auto-indexing bugs. #1581

Closed jkbonfield closed 1 year ago

jkbonfield commented 1 year ago
  1. sam_idx_save wasn't validating the file is BGZF. It's invalid usage to try calling this function on uncompressed data, but we should double check.

    Note this is triggered by a bcftools bug where -o foo.vcf.gz##idx##foo.vcf.gz.csi writes VCF rather than VCF.gz as the "filename" doesn't end in .gz.

  2. Add the hts_idx_amend_last calls to vcf_write as we did previously for SAM/BAM.

    This isn't technically a requirement, as all it's doing is changing virtual offsets to an alternate form that gives the same file offset (see comments above hts_idx_amend_last), but doing so means the auto-build indices match those produced by a standalone index command.

    This fix isn't complete as it hasn't been worked on for BCF yet. However it comes under the "nicety" category and isn't really fixing a bug so we can try to figure out how to tidy up BCF later (plus VCF.gz is basically the universal format).