Open ttencate opened 1 month ago
Can also confirm the regression. In our case, the difference is extreme (by an order of magnitude).
Also encountering this with extracting single tiny file from multiple small zip files (>9000 count), i thought i was going crazy, in my case llseek seems to be taking up alot of cputime.
I can also confirm. Extracting a 109 KB file from a 200 MB archive:
In 2.1.3:
extract (zip) 0.2 ms (109 KB)
In 2.1.6:
extract (zip) 675.5 ms (109 KB)
In 2.2.0:
extract (zip) 683.4 ms (109 KB)
Describe the bug
I have a 266 MB zip file, from which I only need to extract a 1 kB file. The rest of the files in the archive are irrelevant at this stage in the program.
However, opening the zip file using
ZipArchive::new(file)
takes about 7 seconds. It's a lot faster the second time round, because of Linux's filesystem cache.I traced the root cause to
Zip32CentralDirectoryEnd::find_and_parse
, which locates the "end of central directory record" very quickly at the end of the file, but then keeps scanning backwards through the entire file to find another one.To Reproduce
Have a large zip file:
Use this as the main program:
Expected behavior
Extracting a single 1 kB file from a large archive should be possible quickly.
unzip
can do it:Version
zip 2.1.6. This is also happening in 2.1.4, but not in 2.1.3. I think cb2d7abde7863a4ce01dbac5b3b48b4006e60599 or 9bf914d7d41842b381d303becf5364b5b2b8c1f2 is the cause, but I haven't dug deeper.