Closed josephg closed 2 years ago
That's not surprising. I expect it's the deflate that's slow, and it's a naive pure-safe-Rust conversion of the C code, which wasn't fastest to begin with.
You can replace built-in deflate with a custom implementation (e.g. from flate2 or zlib-sys) by adding callback in the settings object.
I'll probably drop the built-in gzip and switch to another crate.
It is not the deflate that's slow, it's TryVec, due to #22 @kornelski. I would recommend either rolling your own variant of TryVec or hoping that fallible_collections fixes this.
As an example, https://github.com/kornelski/lodepng-rust/blob/bf5b0acd5c6b3ac48618c2121204404a25c96e3e/src/rustimpl.rs#L1560-L1568 is allocating in every single loop until the break
once it outpaces its capacity. An alternative would be to just increase the initial capacity, but that's a bit hackish.
I've been using lodepng to generate some schematic images for a project, and when I generate larger PNGs (2k x 1.5k), lodepng in release mode is taking over 1s to generate the png (or over a minute when compiled in debug mode!). In comparison, the png crate takes 50ms to encode the same image:
(The resulting images look the same despite the size difference. pngcrush brings either image down to 20k).
The code is a bit of a mess - I haven't completely isolated the benchmark. But you can run it yourself here: https://github.com/josephg/bp-to-png/tree/a9fd048a67961b35e88308b907143fdacb1f6870