mxmlnkn / rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines
Apache License 2.0
364 stars 7 forks source link

"Invalid lookback distance found!" when seeking backwards in specific deflate stream #40

Closed pdjstone closed 5 months ago

pdjstone commented 5 months ago

With version 0.13.2 (and 0.13.1), I get an error when reading a specific deflate stream using the following code:

import rapidgzip
import zlib

with open('test.bin', 'rb') as fd:    
    decomp = zlib.decompressobj(-15)
    decomp_data = b''
    while len(decomp_data) < 40000000:
        data = fd.read(1024*1024)
        decomp_data += decomp.decompress(data)
    print(decomp_data[30000000])
    print(decomp_data[40000000])

with open('test.bin', 'rb') as fd:
    with rapidgzip.open(fd) as fd2:
        # If I swap the order of these seeks the error doesn't occur
        fd2.seek(40000000)
        print(fd2.read(1)[0])
        fd2.seek(30000000)
        print(fd2.read(1)[0])

I get the following output:

25
2
2
Traceback (most recent call last):
  File "rapidgzipbug.py", line 19, in <module>
    print(fd2.read(1)[0])
  File "rapidgzip.pyx", line 502, in rapidgzip._RapidgzipFile.readinto
RuntimeError: [IsalInflateWrapper][Thread 132968515372608] Decoding failed with error code -3: Invalid lookback distance found! Already decoded 4 B. Read 120 B 2 b during the failing isal_inflate from offset 22979651 B 4 b. Bit range to decode: [183837212, 201376656]. BitReader::size: 335544320. Set window size: 0 B.
mxmlnkn commented 5 months ago

Thank you for reporting this bug! I can confirm it. It still works without error using rapidgzip 0.12.1. It happens only when seeking back because decompression can be delegated to ISA-L then. As I feared, the bug was introduced with the window / seek point compression, but the introducing commit was d03d2a2f44b478a411e504d85040db378cb0fbd9, not the very first window compression commit in 9edb3b9792af573f64a84d7514e8e05dc7a641f9.

mxmlnkn commented 5 months ago

Found the bug. I even already had an unreleased fix for it. Sorry for wasting your time by not releasing the fix, I somehow didn't have the thought to do so. The release should be done ~in the next hour or so~ (half the CI is suddenly broken for multiple obnoxious reasons) this weekend.

pdjstone commented 5 months ago

Thanks for fixing this!