rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
891 stars 158 forks source link

Avoid quadratic complexity in GzDecoder #278

Closed catenacyber closed 3 years ago

catenacyber commented 3 years ago

Quadratic complexity happens when the underlying reader supplies bytes one at a time and would block. And if header flag FNAME is set and we never have a null byte to end the filename, read_gz_header keeps reading all of the partial filename and computing the CRC...

The solution is to keep a more complex structure than a simple buffer for what has already been parsed in the header (as is done in C zlib)

@alexcrichton I will put some comments in the code for your review

catenacyber commented 3 years ago

I guess there should be some additions to the test framework as well. Like testing for some inputs that we get the same result if the reader is supplying bytes one at a time and we call multiple times read with treating the case io::ErrorKind::WouldBlock by making one more byte available...

catenacyber commented 3 years ago

Can you explain more about the read_once aspect? I'm not sure that the internal usage of read_exact will work well with async I/O where it may fail to make progress but may make progress later. It looks like in-progress bytes may be lost?

It is possible I missed some points. What I want read_once to do is to try to read the exact number of bytes of the buffer, and if there are not enough, save them into the structure's buffer... Relooking at it, it looks like it does not do that yet... Will try something else

catenacyber commented 3 years ago

I think read_once was correct because it is used with Buffer and not a generic Read which just keeps buffering the read data (until the buffer is reset/truncated in read_once)

alexcrichton commented 3 years ago

I don't think read_once is working as intended. The read_exact method you're calling may call read multiple times and if it ends in an error then it discards all previously read data and doesn't return how much was read.

catenacyber commented 3 years ago

I don't think read_once is working as intended.

You were right.

I added a test, and fixed read_once accordingly

catenacyber commented 3 years ago

I think read_once is still suffering the same partial-read bugs from before?

The test should prove that read_once works at stage Fourth read : now succesful for 7 bytes read_once calls read_exact which calls read multiple times, but each time the data gets buffered (for subsequent calls to read_once who did not finish)

alexcrichton commented 3 years ago

Hm ok there's some unusual stuff going on with Buffer type I don't remember at all, this will take a bit longer to page everything back in and try to understand...

catenacyber commented 3 years ago

Hm ok there's some unusual stuff going on with Buffer type I don't remember at all, this will take a bit longer to page everything back in and try to understand...

Yes Buffer's read just saves every successful read bytes from the subsequent Read into an internal buffer And in some case, it begins to return these saved bytes, before reading more from the Read

catenacyber commented 3 years ago

Friendly ping @alexcrichton I changed Buffer's Read to use self.part.state to know which buffer to write into...

alexcrichton commented 3 years ago

Sorry this fell off my radar.

I thihnk this looks good for now though, thanks for the PR!

catenacyber commented 3 years ago

Thanks @alexcrichton When would this be available in a release ?

alexcrichton commented 3 years ago

I'll go ahead and release now

catenacyber commented 3 years ago

Thanks Alex