mrkline / piz-rs

Parallelized zip archive reading
zlib License
35 stars 12 forks source link

Slower than `unzip` on .zip with many small files #9

Open SolomidHero opened 1 year ago

SolomidHero commented 1 year ago

I tried to decompress .zip with many small files on aws ec2 instance.

~/data/ (main*) » du -hs vad.zip ; unzip -l vad.zip | wc -l
28M vad.zip
81402

~/data/ (main*) » time unzip -qo vad.zip
unzip -qo vad.zip  1.70s user 4.37s system 99% cpu 6.123 total

~/data/ (main*) » time /home/ubuntu/piz-rs/target/debug/examples/unzip vad.zip
/home/ubuntu/piz-rs/target/debug/examples/unzip vad.zip  29.42s user 5.55s system 281% cpu 12.440 total

~/data/ (main*) » time /home/ubuntu/piz-rs/target/release/examples/unzip vad.zip
/home/ubuntu/piz-rs/target/release/examples/unzip vad.zip  2.10s user 4.27s system 61% cpu 10.357 total
nh2 commented 1 year ago

@SolomidHero If you are running on an ext4 filesystem, your benchmark is probably flawed: ext4 inserts invisible fsyncs for existing files.

If you close() existing files after writing them from scratch, or atomic-rename something replacing them, ext4 will insert an fsync()!

Sources:

I hit this confusing issue when first benchmarking piz. @mrkline The README should mention this issue.


Try run both unzip benchmarks in an empty directory to be sure.