RFE: Changing volblocksize with zfs send/recv

zviratko commented 5 years ago

I'm filing this enhancement request because I believe such functionality would be quite useful.

Implementing it in zfs receive would be a good start, implementing it in both send and receive would be even better if it's possible to maintain compatibility with older ZFS versions this way

It could work like this:

zfs create -o volblocksize=8k pool/zvolforDB
zfs snapshot pool/zvolforDB@snap
zfs send pool/zvolforDB@snap | ... | zfs receive -o volblocksize=1M backuppool/zvolforDB

Thank you for consideration

Why?

We run several private clouds that use ZVOLs as backing storage for volumes of the VMs. We offer our customers several volume types with properties like replication factor and compression but also block size (volblocksize). We also set a "default' blocksize (that one evolved over time, I know one cloud has 8k, another has 32k, the rest probably have 128k). In the past, some customers complained about unsatisfactory performance, which usually turns out to be about blocksize being too small (apparent slowness is while copying/backing up data etc.). It is very noticeable during both synthetic benchmarks and actual workloads and backups. The same storage can zfs send a 128k bs volume @800MB/s but a 8k bs volume at only 50MB/s (and sadly we're talking SSDs in the storage here).

Just recently a customer deployed a rather large (100TB) volume for their veeam backup and it's just unusably slow - turns out it used the default 8KB volblocksize.

The easiest way to correct those mistakes with minimal downtime would be to

1) snapshots those volumes 2) replicate those volumes from this snapshot using zfs send/recv (which in case of such large volumes can take literally days or even weeks!) 3) create another snapshot 4) incrementaly sync (repeat 3 & 4 until caught up to the original volume in time) 5) stop the VM 6) swap the volumes The last two steps only take a few minutes as the snapshot difference will be small, thus causing minimal downtime to the VM.

While it might be possible to use app-level replication in some cases or to archive the old volume and start using a brand new one, sometimes the customers expect us to deal with it or are just unable to do it.

Another use case would be backup volumes. Having the live ZVOLs use small blocksize can have benefits, but on a backup server it would almost always be better to have a much larger blocksize for both space overhead and performance reasons - after all, when I have to recover this volume I hit the same performance difference (even worse, my backup storage uses spinning rust). In this case I could store backups @ 1M volblocksize and convert them to whatever is needed on recovery. And it's not unusual to have an "ASAP!!!" target to recover a volume, so restoring it with a larger blocksize might be a better immediate solution when it takes a fraction of the time.

zviratko commented 5 years ago

@ptx 1) my pools are heavily fragmented (even though they are on SSDs it still makes a difference) 2) those storages are under heavy production load (and this seems to be about latency somewhere)

recently someone on mailing list: "I created a new 2TB ZVOL using 4K block size and NTFS 32K allocation unit size. Write performance went south. Even using sync=disabled. From the 750MB/s I should be seeing to 48MB/s."

That's for writes, but I see such deterioration everywhere, high throughput with low volblocksize is simply impossible - or maybe it doesn't manifest on a lab machine with empty drives and fresh pool/zvols...

Dacesilian commented 3 years ago

Hello, I also vote for this feature, if I can. I'm using NTFS volume with (default) 8k volblocksize and it would be great for me to be able to convert it to for example 128k. Thanks.

GregorKopka commented 1 year ago

A little bump on this...

openzfs / zfs

RFE: Changing volblocksize with zfs send/recv #8704

Thank you for consideration