zip-rs / zip-old

Zip implementation in Rust
MIT License
731 stars 204 forks source link

Faster cde rejection #412

Closed caldwell closed 5 months ago

caldwell commented 1 year ago

Hi,

This patch buffers up the reverse search that happens in spec::CentralDirectoryEnd::find_and_parse(). This speeds the code up tremendously when the file isn't actually a zip file and it's bigger than 64K.

In my app I was serving some static web pages out of a zip file and falling back to serving them from a filesystem path. In my case the zip file was a large file that may or may not actually be a zip file (if it is then the files come out of it, otherwise they come from somewhere else).

On a linux machine I was seeing 400ms or so of end-to-end latency between when I would request the file and when it would get served. This is in the case where the large file is not a zip file.

With this change the end-to-end latency is about 1ms.

The patch also adds some unit tests to src/spec.rs which helped me validate that it was correct. I kept them as a separate commit since they aren't specific to the new code.


I noticed the zip64 code did something similar. I'm not using zip64 but I thought I'd take a crack at it. I'm not as sure about those changes (in the sense that I don't know if they are actually necessary and if it really speeds anything up). It seems like the slow case would exist if there are 1000s of bytes of garbage before the cde64. I don't really understand the case where there is any junk there so I don't know how realistic it is to optimize for some hypothesized worst case. I didn't benchmark this change but 30 seek()+read() syscalls in a row (which is the case for the tests/data/zip64_demo.zip) would be about 200µs (assuming the same overhead as the CentralDirectoryEnd code), which admittedly isn't much.

At any rate, I'm pretty sure the code is correct—I added unit tests for this change as well. These tests are pretty specific to buffer boundaries in the new code so I kept them together with this commit.

In summary, I'm a little more interested in the first 2 patches getting through. I believe the 3rd one (zip64) is technically correct but I can't vouch for it as much as the others.

Thanks for the great Rust library, btw!

Pr0methean commented 5 months ago

Because this repo is no longer maintained, I've replaced this PR with https://github.com/zip-rs/zip2/pull/47.