Closed edmorley closed 1 year ago
Yeah rust in debug mode is going to be much much much slower than anything written in C due to the nature of the languages. (and I'm not sure whether system zlib will even be used in debug/no optimization mode.)
Turning on the first level of optimizations in debug mode may help a fair bit, may be some other workarounds to avoid compiling all deps in debug mode or using different opts for main project/deps but not sure.
I wasn't able to get perf
working inside a QEMU'd Docker container (due to PERF_FLAG_FD_CLOEXEC
not implemented errors), so wasn't able to profile the 207s chronic case unfortunately.
However, this is a flamegraph for a native ARM64 debug build (the 21.55s entry in the table above): (It has to be downloaded for the interactivity to work; hosted on GitHub that is disabled)
As can be seen, 77% of the profile is in Adler32::compute()
:
https://github.com/jonas-schievink/adler/blob/a94f525f62698d699d1fb3cc9112db8c35662b16/src/algo.rs#L5-L107
With 60% of the total profile within the implementation of AddAssign<Self> for U32X4
(used from Adler32::compute()
):
https://github.com/jonas-schievink/adler/blob/a94f525f62698d699d1fb3cc9112db8c35662b16/src/algo.rs#L124-L130
You can override opt-level for certain crates in debug mode, see https://doc.rust-lang.org/cargo/reference/profiles.html#overrides, add the following to Cargo.toml
should make it faster.
[profile.dev.package.miniz_oxide]
opt-level = 3
Closing as this is more of a Rust issue rather than a flate2-specific one.
Hi!
In a particular project, I use
flate2
to decompress a ~50MB gzipped tarfile.Whilst in production the project will be built in release mode, the integration tests are performed using debug builds, and when iterating locally when developing, I use debug builds too.
In addition, due to the nature of the project (a Cloud Native Buildpack that's targetting
x86_64
Linux), these integration tests/any manual testing have to run inside ax86_64
Docker container. After recently obtaining a new Macbook Pro M1 Max (which has to use Docker's qemu emulation forx86_64
Docker images), I was surprised to see the integration tests take considerably longer than they used to on my much older machine.Investigating, it turns out that when using the default flate2 backend of
miniz_oxide
and the below testcase:In contrast, when using the
zlib
orzlib-ng-compat
backends, debug builds are only 2-4x slower than release builds.Whilst debug builds are expected to be slower than release builds, I was quite surprised that they were 30-60x slower for this crate using the default backend.
I'm presuming there's not much that can be done to improve performance of
miniz_oxide
for debug builds, however I was wondering if it would be worth mentioning the performance issues in this crates docs, particularly given that: (a) switching backends makes such a difference here, (b) the docs currently suggest that the default backend is mostly "good enough" (and otherwise I would have tried another backend sooner):(from https://docs.rs/flate2/latest/flate2/#implementation)
It was only later that I noticed this section in the readme (that's not on docs.rs), that seemed to imply the
zlib-ng
backend was actually faster: https://github.com/rust-lang/flate2-rs#backendsTestcase:
Results:
miniz_oxide
(default)miniz_oxide
(default)zlib
zlib
zlib-ng-compat
zlib-ng-compat
(The missing timings for
zlib-ng-compat
under qemu is due to cross-compilation ofzlib-ng
currently failing: https://github.com/rust-lang/libz-sys/issues/93)