samtools / htslib

C library for high-throughput sequencing data formats
Other
799 stars 445 forks source link

tabix on truncated file returns successfully with exit status 0 #1528

Closed bir3 closed 1 year ago

bir3 commented 1 year ago

How to reproduce:

# on branch develop, latest commit 8e43fb0650
curl -r 0-12000 -LO https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz
# or wget -c --header="Range: bytes=0-12000" https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz

htslib-develop/tabix GCF_000001405.25.gz ; echo $?
[E::bgzf_read_block] Failed to read BGZF block data at offset 9832 expected 8913 bytes; hread returned 2151
[W::bgzf_read_block] EOF marker is absent. The input may be truncated
0

ls -ltr
-rw-r--r--. 1 root root 12001 Nov 23 19:49 GCF_000001405.25.gz
-rw-r--r--. 1 root root   113 Nov 23 19:49 GCF_000001405.25.gz.tbi

# also, same result on htslib-1.9
../htslib-1.9/tabix GCF_000001405.25.gz ; echo $?
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
0
bir3 commented 1 year ago

draft PR https://github.com/samtools/htslib/pull/1529

daviesrob commented 1 year ago

Yes, that shouldn't be happening. The error, as you've noticed, is in bgzf_getline() which isn't returning the error result correctly if it's already got some data.

Thanks for the PR, we'll check it out.

daviesrob commented 1 year ago

Closed by https://github.com/samtools/htslib/pull/1529