rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
895 stars 161 forks source link

GzDecoder "invalid gzip header" ? #316

Closed gwbres closed 2 years ago

gwbres commented 2 years ago

Hello,

so I'm working on this crate to parse RINEX files, which most of the time come gzip compressed. For information, decompression is encapsulated in src/reader.rs for hidden/transparent operations.

Things work correctly, until today where we introduced two new test files

although they appear as correct gzip files, we're getting an invalid gzip header error :

# this is a gzip file
zcat ESBC00DNK_R_20201770000_01D_30S_MO.crx.gz | head -n 1
3.0                 COMPACT RINEX FORMAT                    CRINEX VERS   / TYPE

# it is not a bzip file
bzip2 -d ESBC00DNK_R_20201770000_01D_30S_MO.crx.gz 
bzip2: ESBC00DNK_R_20201770000_01D_30S_MO.crx.gz is not a bzip2 file.

# neither zlib
zlib-flate -uncompress < MOJN00DNK_R_20201770000_01D_30S_MO.crx.gz > test.crx
flate: inflate: data: incorrect header check

invalid gzip header error

any ideas ?

MichaelMcDonnell commented 2 years ago

@gwbres, I cannot reproduce the problem in the newest version of rinex. I tried copying the compressed files back into the folders used previously but the tests ran fine. I can see that the code changed a lot recently and I think you've fixed the issue.

I checked out cbdcd15e6849022a99016508f16b8f7fca921471 and could reproduce the problem there. That code creates three readers in from_file. The first reader gets replaced with a second reader. Then in the with_hatanaka method a third reader is created that reuses the file pointer from the second. Both times GzDecoder::new is called. The documentation says that GzDecoder::new immediately reads the gzip header (the code matches that). That means the gzip header is read twice. The GzDecoder::new function only sets the internal state and doesn't return an error. That's why the later call to lines in Header::new returns error.

So I'm pretty sure this is not a bug in flate2 but just a result of calling GzDecoder::new twice.

gwbres commented 2 years ago

Hello @MichaelMcDonnell ,

first of all, thank you for inquiring. The file that leads to the error scenario was indeed removed from the automated test pool to avoid CI issues.

I can see that the code changed a lot recently and I think you've fixed the issue

I just ran a quick test, and that is true

So I'm pretty sure this is not a bug in flate2 but just a result of calling GzDecoder::new twice.

You are absolutely right.

I was trying to make our "BufferedReader" wrapper, possibly enhanced with integrated "CRINEX" decompresion (a compression we use for those files, possibly on top of gzip). To do so, I first need to grab the first 80 bytes of the file and determine whether or not this compression was used or not.

This is what is now commented out right here

I just need to change the approach if we really want to do that. Ideally I would implement a custom Lines Iteration method, I just don't know how to do that yet

MichaelMcDonnell commented 2 years ago

Great! Thanks for letting me know!