openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.57k stars 1.75k forks source link

Investigate zlib-ng as an update to existing zlib code #13245

Open gdevenyi opened 2 years ago

gdevenyi commented 2 years ago

Describe the feature would like to see added to OpenZFS

Switch to https://github.com/zlib-ng/zlib-ng codebase for zlib

How will this feature improve OpenZFS?

The existing zlib code is very old and antiquated, with many workarounds for old systems, which ZFS can't run on anyways. zlib-ng is a modernization effort to take advantage of newer compiler and processor features but otherwise be API/ABI compatible with zlib

rincebrain commented 2 years ago

I actually have a branch with this done already, I just didn't look at pushing it because, well, who uses gzip now that zstd is merged.

rincebrain commented 2 years ago

Here's what I had, it's built on top of another changeset I was working on and got distracted wondering how much better it could be; it built on Debian 11 and Ubuntu 20.04 when I tried it, but I make no bets about not-x86_64 ,for example.

gdevenyi commented 2 years ago

, who uses gzip now that zstd is merged.

Sure I totally agree, just thinking that keeping ZFS on a modern maintained codebase has other more invisible benefits.

rincebrain commented 2 years ago

I mean, our gzip implementation is 105 lines of "either call QAT's implementation or call the OS's zlib implementation", so...

rincebrain commented 2 years ago

Kind of a mixed bag. zlibng WIP 1 zlibng WIP 3 zlibng WIP 2

So gzip-3 through 7, compression on highly compressible data is faster, but 3-9 are as or noticably slower on incompressible data.

gzip-1 takes up markedly more space with zlib-ng, then 3-9 vary, curiously sometimes by a lot (wtf is gzip-9 doing on the highly compressible data?)

(The last isn't the clearest graph I've ever produced, but I don't have a better idea for illustrating savings per unit time other than difference from uncompressed write time, and LZ4 screws that up by being faster than the baseline.)

That was all on my Ryzen 5900X - the graphs don't look much different for my 8700k, the scale of the times is just different.

I'll try giving it a run on my Pi 4 and M1, as well as on decompressing data produced by the baseline compressor (these were all compressing data and then testing decompressing what they just respectively produced).

Might go back to my prior experiment with the Chromium zlib fork, benchmark that and baseline zlib 1.2 (since IIRC Linux ships zlib pre-1.2's compressor because it originally had some minor perf regression on ARM, but they shipped the 1.2 decompressor...).