openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.49k stars 1.74k forks source link

Enable large record size compression for zfs receive #11313

Open stuartthebruce opened 3 years ago

stuartthebruce commented 3 years ago

Describe the feature would like to see added to OpenZFS

I would like to use zfs send/receive to generate efficient backups with a larger recordsize target than the original data. This would allow more data to be backed up on same amount of physical storage due to an improved compressratio. In particular, I would like to backup compression=zstd from a pool with default recordsize=128k to a pool with 1m.

How will this feature improve OpenZFS?

This will allow more data to be backed up to the same amount of physical storage for compression algorithms like zstd that are more effective when compressing larger blocks.

Additional context

I ran a quick test with ZFS 2.0.0 sending from recordsize=128k to 1m with compression=zstd and that significantly under performs running rsync (used= is 1.45 higher). It appears that zfs receive ignores the target recordsize and keeps the original stream value. Note, I am not using any zfs send options (such as -c) so I was hoping zfs receive would attempt to recompress full recordsize blocks.

A discussion of this was started on zfs-discuss at https://zfsonlinux.topicbox.com/groups/zfs-discuss/T01d33a344d5059cb-M9020e3d5463952d9ba30edd4

stuartthebruce commented 3 years ago

From zfs-discuss

I think there is some merit to this use case, however, I would like to request that if any work is done to accommodate this it should be configurable and not the default mode of operation. Often when synchronizing two zpools, it's expected that their utilization matches from a user perspective as a rough "data is being replicated" standpoint. Having the destination be smaller in utilization (i.e. a pure backup target) triggers red flags in users and SAs not thinking that the zpool is fully replicating.

Good point. One possible to way to cover that would be to add a --large-block option to zfs receive that if enabled would aggregate received blocks if they are smaller than the target. TBD whether this should imply decompressing and recompressing streams with compressed WRITE blocks (or throw a warning, or ...).

secabeen commented 3 years ago

One possible to way to cover that would be to add a --large-block option to zfs receive that if enabled would aggregate received blocks if they are smaller than the target.

You'd also need to use -x or -o with zfs receive to force the larger recordsize, or the receive would just apply the recordsize coming in with the send.

IvanVolosyuk commented 3 years ago

You'd also need to use -x or -o with zfs receive to force the larger recordsize, or the receive would just apply the recordsize coming in with the send.

Ideally for incremental receives the flags should not be required and effective recordsize at destination should be used instead.

PrivatePuffin commented 3 years ago

This is something thats actually a awesomely usefull feature to have! :)

skrenes commented 1 year ago

So I did a migration that required using a temporary pool where my datasets inherited the 128k recordsets. When I moved them to their final pool with 1M recordsets, I learned that zfs can shrink but not expand recordsizes in copy.

So in addition to the OG's compression use-case, this is really bad one-way mutation that users aren't warned about. I have a backup server that still has the true 1M recordsets and I can no longer send incremental snapshots because it complains of this misleading error: cannot receive incremental stream: incremental send stream requires -L (--large-block), to match previous receive.. Needless to say, I've included -L as the send option.

jjaakkol commented 1 year ago

I'm in a process of backing up a zfs file system with default recordsize of 128K to a zfs file system which has network block devices with a good throughput, but large write latencies. Merging the records on receive (up to 16M) would speed up the backup significantly.