zip-rs / zip-old

Zip implementation in Rust
MIT License
731 stars 204 forks source link

add ZipWriter::merge_archive() to efficiently copy all entries from a ZipArchive #401

Closed cosmicexplorer closed 5 months ago

cosmicexplorer commented 1 year ago

In my project medusa-zip, I extend this library to support a ZipWriter::merge_archive() operation, which essentially performs the equivalent of .raw_copy_file() from every entry in a source ZipArchive, but much more efficiently; in the new benchmark, it's around 3x as fast:

> cargo bench merge_archive
# ...
running 4 tests
test merge_archive_compressed               ... bench:      21,165 ns/iter (+/- 650) = 5334 MB/s
test merge_archive_raw_copy_file_compressed ... bench:      63,960 ns/iter (+/- 1,354) = 1765 MB/s
test merge_archive_raw_copy_file_stored     ... bench:      61,954 ns/iter (+/- 2,542) = 1814 MB/s
test merge_archive_stored                   ... bench:      21,163 ns/iter (+/- 477) = 5311 MB/s

This seems like a reasonable extension of the existing raw_copy_file() API. With ZipWriter::finish_into_readable() from #400, this change enables use cases like medusa-zip, which creates zip files very efficiently by splitting their contents, creating intermediate zips, then merging them all into one. See e.g. https://github.com/cosmicexplorer/medusa-zip/blob/3197d740b3cd1a49aeced0d7cbfea57e5ca2f32e/lib/src/zip.rs#L736-L744:

    while let Some(intermediate_archive) = handle_intermediates.next().await {
      let output_zip = output_zip.clone();
      task::spawn_blocking(move || {
        output_zip.lease().merge_archive(intermediate_archive)?;
        Ok::<(), MedusaZipError>(())
      })
      .await??;
    }
cosmicexplorer commented 1 year ago

To compare, applying this commit (https://github.com/cosmicexplorer/zip/commit/d6d90f2a0129bc7306063a3290743afdef81dff2) to this PR avoids adding the new cde_start field to Shared, but it slightly reduces performance as per the benchmark:

> cargo bench merge_archive
# ...
running 4 tests
test merge_archive_compressed               ... bench:      21,840 ns/iter (+/- 554) = 5169 MB/s
test merge_archive_raw_copy_file_compressed ... bench:      70,272 ns/iter (+/- 6,201) = 1606 MB/s
test merge_archive_raw_copy_file_stored     ... bench:      67,223 ns/iter (+/- 3,922) = 1672 MB/s
test merge_archive_stored                   ... bench:      22,350 ns/iter (+/- 1,287) = 5029 MB/s
Pr0methean commented 5 months ago

Replaced with https://github.com/zip-rs/zip2/pull/61.