Open stuartthebruce opened 3 years ago
From zfs-discuss
I think there is some merit to this use case, however, I would like to request that if any work is done to accommodate this it should be configurable and not the default mode of operation. Often when synchronizing two zpools, it's expected that their utilization matches from a user perspective as a rough "data is being replicated" standpoint. Having the destination be smaller in utilization (i.e. a pure backup target) triggers red flags in users and SAs not thinking that the zpool is fully replicating.
Good point. One possible to way to cover that would be to add a --large-block option to zfs receive that if enabled would aggregate received blocks if they are smaller than the target. TBD whether this should imply decompressing and recompressing streams with compressed WRITE blocks (or throw a warning, or ...).
One possible to way to cover that would be to add a --large-block option to zfs receive that if enabled would aggregate received blocks if they are smaller than the target.
You'd also need to use -x or -o with zfs receive to force the larger recordsize, or the receive would just apply the recordsize coming in with the send.
You'd also need to use -x or -o with zfs receive to force the larger recordsize, or the receive would just apply the recordsize coming in with the send.
Ideally for incremental receives the flags should not be required and effective recordsize at destination should be used instead.
This is something thats actually a awesomely usefull feature to have! :)
So I did a migration that required using a temporary pool where my datasets inherited the 128k recordsets. When I moved them to their final pool with 1M recordsets, I learned that zfs can shrink but not expand recordsizes in copy.
So in addition to the OG's compression use-case, this is really bad one-way mutation that users aren't warned about. I have a backup server that still has the true 1M recordsets and I can no longer send incremental snapshots because it complains of this misleading error: cannot receive incremental stream: incremental send stream requires -L (--large-block), to match previous receive.
. Needless to say, I've included -L
as the send option.
I'm in a process of backing up a zfs file system with default recordsize of 128K to a zfs file system which has network block devices with a good throughput, but large write latencies. Merging the records on receive (up to 16M) would speed up the backup significantly.
Describe the feature would like to see added to OpenZFS
I would like to use zfs send/receive to generate efficient backups with a larger recordsize target than the original data. This would allow more data to be backed up on same amount of physical storage due to an improved compressratio. In particular, I would like to backup compression=zstd from a pool with default recordsize=128k to a pool with 1m.
How will this feature improve OpenZFS?
This will allow more data to be backed up to the same amount of physical storage for compression algorithms like zstd that are more effective when compressing larger blocks.
Additional context
I ran a quick test with ZFS 2.0.0 sending from recordsize=128k to 1m with compression=zstd and that significantly under performs running rsync (used= is 1.45 higher). It appears that zfs receive ignores the target recordsize and keeps the original stream value. Note, I am not using any zfs send options (such as -c) so I was hoping zfs receive would attempt to recompress full recordsize blocks.
A discussion of this was started on zfs-discuss at https://zfsonlinux.topicbox.com/groups/zfs-discuss/T01d33a344d5059cb-M9020e3d5463952d9ba30edd4