samtools / htslib

C library for high-throughput sequencing data formats
Other
784 stars 447 forks source link

Add "uncompressed" in hts_format_description() where appropriate #1656

Closed jmarshall closed 11 months ago

jmarshall commented 11 months ago

In samtools/samtools#1884 we saw a BAM file downloaded via a web browser and ungzipped. htsfile reported this file as follows:

$ htsfile wgEncode-browser.bam
wgEncode-browser.bam:   BAM version 1 sequence data

In isolation it's not obvious that this is reporting that the BAM file is not BGZF-compressed as normal. (For usual BAM files, htsfile reports BAM version 1 compressed sequence data, but if you don't have one handy to compare…)

This PR adds “uncompressed” for uncompressed files in formats, like BAM and BCF, that are normally compressed, to make this clear. Thus:

$ htsfile wg*.bam
wgEncode-browser.bam:   BAM version 1 uncompressed sequence data
wgEncode-curl.bam:  BAM version 1 compressed sequence data