openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.68k stars 1.76k forks source link

Update LZ4 to 1.10.0 #16399

Open SeanReece opened 3 months ago

SeanReece commented 3 months ago

Describe the feature would like to see added to OpenZFS

Just stumbled across this substantial update to LZ4 https://github.com/lz4/lz4/releases/tag/v1.10.0

I'm fairly new here so not sure about the viability of bringing in this update but I see that LZ4 was updated to 1.9.3 in this PR: https://github.com/openzfs/zfs/pull/12805

How will this feature improve OpenZFS?

LZ4 1.10.0: https://github.com/lz4/lz4/releases/tag/v1.10.0

LZ4 1.9.4: https://github.com/lz4/lz4/releases/tag/v1.9.4

rincebrain commented 3 months ago

I know people love bigger version numbers, but that's not really going to buy us much, if I had to guess.

It's not nothing, but the multithreading change doesn't matter because it's not applicable here.

SeanReece commented 3 months ago

I know people love bigger version numbers

I have to admit I'm guilty of this far too often. I wouldn't normally have brought it up but they're touting this as a pretty substantial update, so I thought it may be of some benefit to ZFS systems using LZ4.

Like I said, I'm pretty new here so not exactly sure why multi-threading wouldn't be applicable here. My assumption was that ZFS is usually run on systems with many threads so we'd see a good performance uplift here. Is ZFS compression/decompression single threaded? I'm also not exactly sure why LZ4 (and other compression algorithms like ZSTD, etc) are brought into the openzfs code directly instead of as a dependency or using the LZ4 package directly. I haven't done any C development in years so my ignorance knows no bounds here.

Would be interesting to run see some benchmarks though.

rincebrain commented 3 months ago

Well, let's see.

Multi-threading only matters for multiple distinct decompression units of work, and ZFS treats it all as single-unit compression operations, so it wouldn't help us here.

ZFS also dispatches each as a separate kernel thread, so it's not like we were serializing all decompression calls before.

Userland threading and kernel threading don't necessarily have anything to do with each other in terms of complexity.

And ZFS really really wants Compress(Decompress(Compress(X))) == Compress(X), which is not guaranteed to be true across versions, so it bundles its own copies of them so that that doesn't suddenly change underneath it if the Linux or FreeBSD system has a different version. (That's also why the PR you linked only updates the decompressor code.)

gmelikov commented 3 months ago

So, on compression it may not have any gain to use LZ4 multithreading, on decompression we may get some latency speedup, but we'll eat much more CPU and memory bandwidth in this case.

Don't see any low hanging fruits for us here, IMHO.

SeanReece commented 3 months ago

Thanks for all the details here. I really appreciate it.

Ran a quick test lzbench with 1.9.4 vs r129 (~9 years old). r129 is when they introduced the "new" api so I couldn't quickly swap out an older version in lzbench. I believe OpenZFS is using r85 for compression so I'm not sure what performance changes (if any) were made before r129. I also tested 1.10.0, but got very similar results vs 1.9.4/1.9.3.

Tests run an a Mac M1 Pro. File used is a ~100MB text file.

128K block size: I saw about 18% increase in compression speed and 20% increase in decompression speed. 16K block size: Saw less improvement here. 5% faster compression/ 12% faster decompression

1.9.4

> ./lzbench -b128 -elz4 ./file.json
lzbench 1.8 (64-bit MacOS)  (null)

Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                  47174 MB/s 47179 MB/s    18231014 100.00 ./file.json
lz4 r129                 759 MB/s  5250 MB/s     5483206  30.08 ./file.json
done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=128KB cSpeed=0MB)

r129

./lzbench -b128 -elz4 ./file.json
lzbench 1.8 (64-bit MacOS)  (null)
Assembled by P.Skibinski

Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                  44223 MB/s 46081 MB/s    18231014 100.00 ./file.json
lz4 1.9.3                 642 MB/s  4348 MB/s     5483130  30.08 ./file.json
done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=128KB cSpeed=0MB)

So these are not earth shattering numbers here but I do think there's some performance to be gained by upgrading the compressor from r85. I'm not sure if Compress(Decompress(Compress(X))) == Compress(X) across these versions though.

rincebrain commented 3 months ago

It's not, no. Not even close.

You even knew that from the output of your own tests.