Open jmarshall opened 3 days ago
I'd checked that the fp->block_address
update wasn't needed while going over the PR, but evidently missed fp->uncompressed_address
. What did you do to spot this? As the test harness currently passes with the incorrect change, it could do with extending to cover this problem.
Oops. I was sure I'd got both of those incremented, but I confess I had many alternative versions while benchmarking and I guess I lost something along the way. Sorry.
I'll try and get a test that fails before applying the fix. Thanks for reporting it.
I spotted this while changing fai_retrieve()
from bgzf_useek
+bgzf_getc
xLOTS to bgzf_useek
+ bgzf_read_small
xSOME — see PR #1799. As that PR mixes bgzf_useek
and bgzf_read_small
, with it applied there are already faidx test cases that fail indirectly due to this problem. But it would be good to test it directly too, rather than indirectly. The sequence of events is basically:
bgzf_read_small(…, buf1, 5); // read five characters
bgzf_useek(…, 10); // seek to offset 10
bgzf_read_small(…, buf2, 3); // read another few characters; this one could be bgzf_read()
assert(buf2 is the three characters at offset 10…12);
This direct test case will fail because it will have read three characters from a different offset.
I was already working on something similar in test_bgzf.c which I've now added to verify I could trigger the bug, and then also applied the trivial one line fix. Thanks.
PR #1772 added a simplified inline version of
bgzf_read()
that uses inline code when the request can be satisfied directly from the buffer, otherwise punts to the realbgzf_read()
. Very laudable. However I encountered seek problems when using it.I rederived the inline function by starting with the code of the full
bgzf_read()
, assuming the invariant thatlength < fp‑>block_length - fp‑>block_offset
, and simplifying the code accordingly. I ended up with a function that is fairly similar tobgzf_read_small()
as added to htslib/bgzf.h but with some additions:Because
bgzf_read_small()
does not currently updatefp‑>uncompressed_address
, subsequentbgzf_useek()
calls may jump to the wrong location. And probably other functions usefp‑>uncompressed_address
too and are affected.The
if
block looks harder to deal with, because it callsbgzf_htell()
which is private within bgzf.c. However it turns out that the invariant implies that thisif
will never be true, so in fact I should have simplified it away to nothing too. Phew.So
bgzf_read_small()
just needsfp->uncompressed_address += length;
added to it to make it equivalent tobgzf_read()
.(I have not analysed
bgzf_write_small()
to see if it has any similar infelicities.)