icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.
http://icsharpcode.github.io/SharpZipLib/
MIT License
3.7k stars 976 forks source link

Improve zip file iteration performance #843

Closed lahma closed 1 year ago

lahma commented 1 year ago

When traversing large zip files I noticed that SharpZipLib was allocating quite a lot of memory. So I found out that by pooling of costly Inflator instances a lot of that can be rectified. There's now new PooledInflator type that can be used by the library itself and is known to be safe to pool (it's rented from pool or is otherwise instance created by the library itself).

I also changed ZipFile to follow the common public struct enumerator pattern found in BCL data structures that removes unnecessary allocations

Benchmark test case updated and now targets oldest supported runtimes (.NET 4.6.2 and .NET 6.0). Unsupported runtimes are not installed by default.

Results


BenchmarkDotNet v0.13.7, Windows 11 (10.0.23516.1000)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 7.0.400
  [Host]     : .NET 6.0.21 (6.0.2123.36311), X64 RyuJIT AVX2
  Job-ERDPDJ : .NET 6.0.21 (6.0.2123.36311), X64 RyuJIT AVX2
  Job-HFJNUX : .NET Framework 4.8.1 (4.8.9179.0), X64 RyuJIT VectorSize=256

ICSharpCode.SharpZipLib.Benchmark.Zip.ZipInputStream

Diff Method Toolchain Mean Error Allocated
Old ReadZipInputStream .NET 6.0 151.4 ms 0.60 ms 104.54 KB
New .NET 6.0 151.7 ms (0%) 0.84 ms 72.15 KB (-31%)
Old ReadZipInputStream .NET Framework 4.6.2 150.2 ms 0.50 ms 106.05 KB
New .NET Framework 4.6.2 150.8 ms (0%) 0.70 ms 74 KB (-30%)
Old ReadLargeZipFile .NET 6.0 1,337.3 ms 8.18 ms 2412174.27 KB
New .NET 6.0 1,292.3 ms (-3%) 9.08 ms 548513.15 KB (-77%)
Old ReadLargeZipFile .NET Framework 4.6.2 1,499.4 ms 11.41 ms 2421273.81 KB
New .NET Framework 4.6.2 1,452.4 ms (-3%) 3.35 ms 556703.89 KB (-77%)

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

lahma commented 1 year ago

Thank you for the review. As I mentioned in comment, the ZipFile scenario is hard to use when you cannot control to internal input stream creation. I created a fix commit that proposes to have InflaterPool.Size which is only public member and allows to disable the pooling by supplying a value of 0 or less.

What do you think about the approach? I would also gladly any commits from you into this part shaping the solution towards your liking, but also hoping that the benchmark's ZipFile API usage would be covered.

piksel commented 1 year ago

I think that would be a good compromise. I am not sure about making the pool public though... Perhaps there should be a public singleton that maps to the internals of the pool.

That way we have a clear consumer API for global library tweaks... Or perhaps I am just overthinking it.

The initial value would need to be 0 though, as the goal is to not accidently break anything (but again, maybe I am making this into a larger issue than it needs to be..)

lahma commented 1 year ago

It's always good to keep downstream consumers happy! 😉 I've force pushed a proposal of adding root level static class for global options, SharpZipLibOptions, which has now sole property of InflaterPoolSize which defaults to 0. Benchmark updated to set the value to 4 (allocations still down 70%).