XZ compression support - Githubissues

torn5 commented 12 years ago

This is a feature request to add XZ compression to zfsonlinux. (I believe this would unfortunately require on-disk format change, so I am not sure it's acceptable)

XZ is much superior to bzip2 in every aspect, and even mostly obsoletes gzip.

XZ at the "-1" (very fast) compression preset, is barely 30% slower than gzip at default compression, but produces files as small as or even smaller than bzip2 at default compression (at which bzip is immensely slower).

baryluk commented 12 years ago

Really?

Can you please provide some references to the last statement? bzip2, lzma and lzo (very fast light compression/decompression, also of data already compressed) are implemented in the zfs-fuse. THey already assigned some identifiers for this 3 compression schemes (actually more as they also have things like bzip2-9, bzip2-8, ... bzip2-1, lzma-1, lzma-2, lzma-3, ..., etc), giving about 25 compression methods (compression attribute values), but still only 3 formats on disks, as they are all readable regardless of compression level by same code (or nacassary informations are in actually compressed stream/data).

LZO is very usefull for general usage (it is faster than gzip, and most produces smaller files, or only slightly bigger than gzip at standard level), and bzip2 is usefull for long term archiving and backups.

I think lzo, lzma, and xz, bzip2 all can be implemented quite easly in zfsonlinux, as all of them already have kernel implementations. Right?

dechamps commented 12 years ago

AFAIK, XZ needs substantial amounts of memory when compressing/decompressing. For this reason, I'm not convinced that using it inside ZFS is a very good idea.

dajhorn commented 12 years ago

Notwithstanding whether XZ is worthwhile, there are two reasons why it is unlikely that it will be added to ZoL:

It would break fs image compatibility with Solaris, which would violate a ZoL design goal.
It would need backporting from Linux 2.6.38 to the 2.6.26 kernel, or whatever is currently supported.

This request would be better sent to Oracle. Get them to approve it, and then get them to release the latest ZFS source code.

johnr14 commented 6 years ago

I was wondering if this could be worth it since this issue is still open, so I did a bit of research and try to provide an opinion.

LZMA or XZ gives 50% more compression than LZ4 on a linux kernel 3.3 tarball at the expense of 24X more compression time, 18X more decompression time, but less memory required for compression (1.4X) & decompression (13X). The kernel has 2.81X compression ratio with LZ4.

Tests on datablocks would be needed to know if those figures are still holding the road for a ZFS implantation.

It would be worthwhile to take a few pools with data, check LZ4 compression ratio and calculate saved space and multiply that by 2. But on most of my pools, I have compression ratio of 1.00 (Media) to 1.42 (rpool).

On the article on ServeTheHome, they get 1.93X compression on a VM. That means 14.3Gb compressed of 32Gb uncompressed VM. If we where to use XZ, we could estimate the best case scenario at twice better compression: dataset would drop to 7.15Gb.

Now the big questions : Would latency make it irrelevant to use it on a VM ? Responsiveness could be hit. Would you run a VM from a XZ compressed ZFS ? Probably not, but might be useful for backup/snapshot, but you could manually compress it with XZ in that case. In case of write once read many data, could the compressed datasets in the L2Arc give an advantage in certain case scenario ? Probably if data is on slow spinning disk or single raidz3, a smaller L2arc could hold the same amount of data and free more system memory.

Also L2Arc latency impacts needs to be taken in consideration.

Usefulness : archiving compressible data (root,VMs, logs), rarely used data or latency irrelevant data.

Priority : after thinking about it, I find that there are more urgent features to be added and LZ4 does a good job, perhaps it could be better but the efforts needed don't seem to be worth it for now. (in my view)

Sources : https://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO https://www.servethehome.com/the-case-for-using-zfs-compression/ https://wiki.illumos.org/display/illumos/L2ARC+Compression

rmarder commented 5 years ago

ZFS would benefit greatly from the addition of the LZMA/LZMA2 compression format (however it doesn't make much sense to drag along the XZ container format too if that is easily avoided).

This is something that should be done upstream with all OpenZFS projects, not particularly in the ZFSonLinux project alone. We should not break compatibility with the other OpenZFS implementations.

I disagree that this is a feature that Oracle needs to have. OpenZFS pool version 5000 already diverges greatly from Oracle ZFS and it is already assumed by everyone that requires cross compatibility to use pool version 28 or older. There is no reason I can see why this couldn't be added as a feature flag for use by those that wanted it.

rincebrain commented 5 years ago

@rmarder I think the Oracle suggestion was made in a different time, when people were still hoping the source faucet would reopen. As it stands, the incompatibility ship has long sailed, with feature flags and Oracle incrementing their on-disk format.

As far as LZMA/LZMA2 support, I think you'd be welcome to submit it, but currently #3908 #5927 and #6247 , maybe #7560 , seem like where people are spending effort, since they're varying states of mergeable. (Also, #5846 means gzip isn't necessarily going to peg your CPU for people who have that use case.)

PrivatePuffin commented 4 years ago

Considering we just merged ZSTD which has about the same ratio's but significantly faster... Is this really something thats even up for consideration anymore?

gordan-bobic commented 3 years ago

xz still seems to compress smaller than zstd, so IMO there would still be benefit to having it.

PrivatePuffin commented 3 years ago

@gordan-bobic At the same speed it's the same or slightly worse in nearly all benchmarks.

jgottula commented 3 years ago

xz still seems to compress smaller than zstd, so IMO there would still be benefit to having it.

@gordan-bobic At the same speed it's the same or slightly worse in nearly all benchmarks.

@Ornias1993 I think the point being stated here is that, for infrequently accessed archival data where maximum possible compression ratio is desirable and compression/decompression speed is effectively immaterial, the highest end of xz still beats out what you can get from the highest end of zstd.

And so there are use cases where "worse at the same speed" isn't really meaningful, and the higher attainable compression ratio is meaningful; and so it still provides utility for those situations.

(Asterisk: This is based on generalized corpus benchmark data, and I don't know for certain that top end ratios of xz still beat those of zstd when we're talking about relatively small chunks limited to recordsize. But I don't have any particular reason to assume that it doesn't apply either. And not every dataset necessarily uses default 128K recordsize anyway.)

Too niche a use case to bother caring about? Maybe. I dunno. But there is a use case there nevertheless.

PrivatePuffin commented 3 years ago

Maximum levels? Any idea how many levels of ZSTD are possible? :P We artificially limited them, because it would've been too insane to support more than we currently do...

At the same speed XZ is at the very least comparable to ZSTD. That includes higher levels of ZSTD.

https://sysdfree.wordpress.com/2020/01/04/293/

PrivatePuffin commented 3 years ago

Even so, considering we spend years trying to even get a thorough review on zstd and no one with any knowhow in the field is even slightly interested in XZ support, I think i'll end this with:

Isn't going to be developed anyway for just a niche usecase that might or might not exist considering the effort required and general disinterest by the zfs maintainers for compression algorithms...

jgottula commented 3 years ago

Even so, considering we spend years trying to even get a thorough review on zstd and no one with any knowhow in the field is even slightly interested in XZ support, I think i'll end this with:

Isn't going to be developed anyway for just a niche usecase that might or might not exist considering the effort required and general disinterest by the zfs maintainers for compression algorithms...

@Ornias1993 Hey now, I was clear and upfront about it likely being a niche use case and so therefore likely not justifying development. So I don't think we're in disagreement on that.

I do still think that the technical point of xz having higher best-case ratios holds; yeah the difference may not be giant, but it's there.

And so it is potentially of utility, as a general statement, even if overall it's ultimately not worthwhile from a development work point of view. 🤷‍♂️

(Incidentally, the tables I'd referenced for ratio comparisons are the same ones you linked!)

gordan-bobic commented 3 years ago

According to those charts, at the high compression end, xz is both faster and compresses smaller.

ryao commented 2 years ago

According to those charts, at the high compression end, xz is both faster and compresses smaller.

The first charts show xz levels 4 through 9 beating zstd 19's compression ratio with only xz level 4 beating zstd's compression speed, but the margins of victory are fairly small. The second charts show xz levels 5 through 9 beating zstd 19's compression ratio with only xz level 5 beating its speed, again with small margins of victory.

When I first saw this, I was concerned about memory usage. Since we now support zstd, I suppose this should be reconsidered:

https://github.com/facebook/zstd/blob/ff6350c098300ee3050dc6a6cdc0f48032755e84/lib/compress/zstd_compress.c#L4081 https://github.com/facebook/zstd/blob/0f4fd28a64880bdd1c14847983d5a7561950d8d5/doc/zstd_manual.html#L979

At zstd -19, the window log is 8MB, the chain log is 16MB and the hashlog is 4MB, for a total of 28MB used for compression. Decompression uses less (although I do not know how much less offhand). On the other hand, xz's man page says that xz level 4 uses 48MB of RAM while xz level 9 uses 674MB of RAM. This is for compression.

My feeling is that this is excessive, even with more modern machines. ZFS should be able to work on lower memory systems (e.g. 256MB of RAM), but supporting the highest xz levels would cause us to run out of memory and hang on low memory systems.

My opinion is that the memory requirements of XZ are too high for use in a filesystem. If we were to limit it to xz levels 1 through 3 to keep it within the realm of zstd's memory usage, then its memory usage would be okay, but then xz loses its advantage over zstd, which defeats the purpose of implementing it.

ryao commented 2 years ago

@rmarder bridfly touched on this when he mentioned that the xz container format is unnecessary, but to expand on that, if we were to implement this, we would not want the lzma2 container format either. I feel that needs to be said since the xz container format encapsulates the lzma2 container format. Just dropping the xz container format is therefore not enough.

Another thing that occurs to me is that the additional container formats waste space not just from multiple layers of headers, but also from included padding and checksums. The compression and decompression is likely also somewhat slowed down by the checksums, which are unnecessary in a hypothetical ZFS implementation since ZFS has its own checksums. The lzip developer demonstrated the expense of the checksums when he addressed why busybox unxz is faster than his implementation:

https://www.nongnu.org/lzip/lzip_benchmark.html#busybox

That said, those checksums probably should be disabled for a “fair” evaluation of the merits of lzma against zstd. I still suspect that it would not be able to perform well enough to justify its inclusion if memory requirements were restricted to roughly what zstd uses.

rmarder commented 2 years ago

I don't believe we should consider compression memory usage to be a problem. There are already other optional feature flags in ZFS (ex: dedup) that suffer from a similar problem of heavy memory usage when enabled.

Furthermore, there are multiple ways to ensure there is sufficient memory on the system for the chosen compression level, if that is wanted. A lazy approach would be to simply skip compression if the system resources to do it aren't available (ZFS already silently skips compression in certain conditions).

Now, decompression memory usage is a serious concern. If there isn't enough system memory for decompression, there is very little we can do to work around that problem.

PrivatePuffin commented 2 years ago

We already have instances where high (9 and up) levels of zstd breaks down the complete zfs system, due to excessive memory consumption.

When developing zstd support everything above 9 was considered not-feasable and nothing more than a tech demo.

One also needs to take into account that small gains in a compression test will not reflect the same on zfs. The same way ratio’s and speeds of stock zstd are not the same as zstd-on-zfs.

So there might not even be any gain at all from these algos and its not even worthwhile discussing until someone proves(!) with a PoC these can actually reach beter speeds or ratio’s when integrated in the zfs stack.

Fir zstd we could guess this, because even 50% of the performance would outperform the other compression algos in zfs. But with these margins, this needs a PoC.

So yes: I basically call bullshit on the performance gain guesses.

ryao commented 2 years ago

I don't believe we should consider compression memory usage to be a problem. There are already other optional feature flags in ZFS (ex: dedup) that suffer from a similar problem of heavy memory usage when enabled.

The difference between the memory usage of deduplication and the memory usage of compression is that deduplication just becomes slower from ARC not being able to provide enough memory while compression will literally hang the system if it does not have enough memory. That is why we have been so conservative when adding new high compression ratio algorithms.

zstd was only added in part because it was just so good that others did not feel justified in saying no, but as @Ornias1993 pointed out, it was accepted in a way that allows it to deadlock certain system configurations from excessive memory use. Had I been active at the time, I probably would have requested that the higher levels remain unimplemented out of concern that they would cause deadlocks on low memory systems. The reports that zstd's higher memory usage configurations have caused deadlocks hurt the prospects of lzma, since those deadlocks are no longer merely a theoretical concern and lzma wants even more memory than zstd at the levels at which it has a slight edge.

Also, if we were to push zstd to those memory levels by tweaking the configuration rather than using the presets that the authors used, I suspect that it would outperform lzma.

We already have instances where high (9 and up) levels of zstd breaks down the complete zfs system, due to excessive memory consumption.

I had suspected this myself, but I was not involved with the development of it at the time, so I had assumed others had already considered this. On the bright side, a quick look suggests that this will not happen on most low memory systems since they also have low core counts, which limits the number of simultaneous threads. A good example would be the Raspberry Pi.

We probably could fix the deadlocks by limiting the number of IO threads that may simultaneously perform compression when the system memory is too low. The way it would work would be to keep track of how much memory each compression operation may use and set an upper limit. Then maintain a variable that is "memory available for compression" that will allow IO threads to grab chunks of it. If there is not enough memory available, the IO thread would then have to cv_wait() on more becoming available. An additional variable could be used to keep track of the number of threads that have memory allocations. If it is 0 at the time of the allocation, then we could allow compression to proceed to avoid deadlocks from someone accidentally setting that variable too low. A third variable could be added to allow this to be tunable to allow greater numberes of threads (at the user's risk). A fourth variable that is calculated at module initialization time could be added to disable this behavior entirely on large memory systems. One downside of this mechanism is that if a system enters a deadlock state, it does not provide a way for a system administrator to unstick the deadlock (unless we have some way to have the kernel module parameter change do a cv_broadcast()).

Is there an open issue for this?

One also needs to take into account that small gains in a compression test will not reflect the same on zfs. The same way ratio’s and speeds of stock zstd are not the same as zstd-on-zfs.

A good comparison for the sake of ZFS would involve doing compression in recordsize-sized blocks to to evaluate compression performance.

Fir zstd we could guess this, because even 50% of the performance would outperform the other compression algos in zfs. But with these margins, this needs a PoC.

Agreed.

So yes: I basically call bullshit on the performance gain guesses.

I am a little more optimistic than you, but my conclusion is that even if it does perform slightly better in both compression time and compression ratio, it is not enough to matter. The memory issue is just too big where it does better. Furthermore, decompression performance tends to matter more than compression performance and the decompression performance of lzma is terrible.

That said, I do not mean to berate lzma (as I have long been a fan of it), but I just feel that it is not suitable for use in a filesystem for the reasons I have stated.

ryao commented 2 years ago

One also needs to take into account that small gains in a compression test will not reflect the same on zfs. The same way ratio’s and speeds of stock zstd are not the same as zstd-on-zfs.

So there might not even be any gain at all from these algos and its not even worthwhile discussing until someone proves(!) with a PoC these can actually reach beter speeds or ratio’s when integrated in the zfs stack.

This is an excellent point. It turns out that both zstd and xz support doing compression in blocks. It is intended to be used in conjunction with multithreading, since breaking the input stream into blocks that are independently compressed is very amendable to multithreading. Coincidentally, doing multithreading across multiple blocks is similar to what happens inside ZFS due to the IO threads. The main dissimilarity would be that ZFS pads the compressed blocks to multiples of 4K (for ashift=12), while this test does not support that.

Anyway, I decided to use the Linux kernel to get some quick data on my Ryzen 7 5800X. I ran these commands to get some comparison data for our default recordsize=128K:

wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.19.4.tar.xz 2>/dev/null
unxz linux-5.19.4.tar.xz

for i in {1,6,9,9e}; do echo Testing level $i; time xz -$i --block-size=131072 -T16 -kf linux-5.19.4.tar; ls -l linux-5.19.4.tar.xz; done;
for i in {3,5,7,8,13,16,18,19}; do echo Testing level $i; time zstd -T16 -$i -B131072 -kf linux-5.19.4.tar; ls -l linux-5.19.4.tar.zst; done;

In summary, zstd outperformed xz in ways that public benchmark data would not predict. Here is the data showing compressed file size and real time:

zstd -3: 199152950 bytes at 0m0.964s
zstd -5: 182514093 bytes at 0m1.590s
zstd -7: 172636235 bytes at 0m2.324s
zstd -8: 168033531 bytes at 0m2.846s
zstd -13: 159830544 bytes at 0m13.138s
zstd -16: 146924095 bytes at 0m23.673s
zstd -18: 139856778 bytes at 0m37.729s
zstd -19: 135734231 bytes at 0m52.628s
xz -1: 201063104 bytes at 0m3.397s
xz -6: 182904512 bytes at 0m17.761s
xz -9: 182904512 bytes at 0m29.464s
xz -9e: 181613860 bytes at 0m37.067s

To summarize that data, zstd -5 has better compression than all levels of xz except xz -9e (which is only marginally better) while running substantially faster than all levels of xz. Switching to zstd -7 gives better performance and better compression than all levels of xz while still running substantially faster. The default levels are xz -6 and zstd -3. Also, I did not make a mistake in copying the data. xz -6 and xz -9 really did produce the same file size.

To be fairer to xz, I decided to retest with a 1M block size:

for i in {1,6,9,9e}; do echo Testing level $i; time xz -$i --block-size=1048576 -T16 -kf linux-5.19.4.tar; ls -l linux-5.19.4.tar.xz; done;
for i in {3,5,7,8,13,16,18,19}; do echo Testing level $i; time zstd -T16 -$i -B1048576 -kf linux-5.19.4.tar; ls -l linux-5.19.4.tar.zst; done;

This gave:

zstd -3: 197326251 bytes at 0m0.898s
zstd -5: 180663508 bytes at 0m1.358s
zstd -7: 170431983 bytes at 0m2.032s
zstd -8: 166525563 bytes at 0m2.447s
zstd -13: 159830544 bytes at 0m13.193s
zstd -16: 146924095 bytes at 0m24.073s
zstd -18: 139856778 bytes at 0m37.728s
zstd -19: 135734231 bytes at 0m51.081s
xz -1: 176606428 bytes at 0m3.301s
xz -6: 152771800 bytes at 0m18.304s
xz -9: 152771768 bytes at 0m19.742s
xz -9e: 150861732 bytes at 0m32.803s

xz did better here, but we have zstd -16 give better compression and a faster runtime than all but 1 of the tested levels of xz.

Since we support a maximum recordsize of 16M (that nobody likely uses), I decided to re-run the tests against that:

for i in {1,6,9,9e}; do echo Testing level $i; time xz -$i --block-size=16777216 -T16 -kf linux-5.19.4.tar; ls -l linux-5.19.4.tar.xz; done;
for i in {3,5,7,8,13,16,18,19}; do echo Testing level $i; time zstd -T16 -$i -B16777216 -kf linux-5.19.4.tar; ls -l linux-5.19.4.tar.zst; done;

zstd -3: 193474165 bytes at 0m1.012s
zstd -5: 176620540 bytes at 0m1.258s
zstd -7: 165250150 bytes at 0m1.811s
zstd -8: 162924163 bytes at 0m1.992s
zstd -13: 155930328 bytes at 0m7.829s
zstd -16: 145816148 bytes at 0m16.568s
zstd -18: 138828307 bytes at 0m29.190s
zstd -19: 135789235 bytes at 0m44.022s
xz -1: 169927252 bytes at 0m3.628s
xz -6: 134945448 bytes at 0m26.236s
xz -9: 134180196 bytes at 0m27.530s
xz -9e: 132004264 bytes at 0m43.944s

Interestingly, zstd becomes faster here while xz becomes slower. At the same time, compression ratios have improved for both, but much more for xz than zstd. ztd -19 and xz -9e are both close in time and compression ratio.

This is not the silensia corpus, but this data does not show xz as favorably as the public benchmark data does.

Also, I noticed that my previous remark turned out to be wrong:

Also, if we were to push zstd to those memory levels by tweaking the configuration rather than using the presets that the authors used, I suspect that it would outperform lzma.

When the recordsize is infinite, nothing I could do in terms of giving more memory to zstd was able to match xz -9e (although it came very close). However, that is a moot point since the test was conducted with an infinite recordsize. The results from tests where I set a compression block size to simulate realistic record sizes showed zstd as being overwhelmingly superior in all situations that matter.

Also, these tests have shown me that I should consider using higher levels of zstd compression at my home. I might repeat these tests on the silensia corpus later to get a more fair comparison, but I do not expect much to change in terms of the conclusions. xz is better than zstd at finding opportunities to do compression in large records, but those records just are not used in ZFS, and at the recordsizes that are used in ZFS, zstd is overwhelmingly better (although the ratio of the default compression level is not as good as xz).

rincebrain commented 2 years ago

Keep in mind, ZFS is shipping zstd 1.4.5, and 1.5.1 and up changed the settings for various recordsizes and compression levels in ways that can significantly affect the performance, so you might see very different outcomes on ZFS versus the CLI.

ryao commented 2 years ago

Keep in mind, ZFS is shipping zstd 1.4.5, and 1.5.1 and up changed the settings for various recordsizes and compression levels in ways that can significantly affect the performance, so you might see very different outcomes on ZFS versus the CLI.

That is a good point. That might partially explain why my estimate of the zstd-19 memory usage varies so much from what Allan Jude reported the ZFS implementation uses:

https://openzfs.org/w/images/b/b3/03-OpenZFS_2017_-_ZStandard_in_ZFS.pdf

It is possible to lookup the 1.4.5 settings and configure zstd 1.5.1+ to use them for a more fair comparison, but I do not expect things to become better for LZMA.

It also is probably worth noting that xz’s multiple headers give it a disadvantage at smaller record sizes in the comparison I did, although I do not expect the disadvantage to be big enough to bridge the gap with zstd.

That said, unless LZMA’s memory usage can be lowered to zstd levels while being non-negligibly better in at least some common use case that applies to ZFS users, I do not think any revision to testing methodology would make it compare favorably enough to zstd to merit inclusion (or even the effort to do a proof of concept).

openzfs / zfs

XZ compression support #406