Open Rudd-O opened 7 months ago
FWIW, setting a very low volblocksize
is often an exceedingly poor idea ,which is why the default was 8K, and is now 16K normally (and higher if you're using draid).
Separate from that remark, at least in recent memory, the closest analogue I can think of is how special_small_blocks
used to not do anything on volume
datasets even though it was inherited and settable.
But the two important differences here are, A) volblocksize
isn't mutable after creation, so a naive implementation would need to walk children for an explicit zfs set
if you modified it, and B) it would break people who predated volblocksize
being a valid property there, potentially, to not have an explicit property set, leading to...
C) I think this probably doable if you just make the volume creation step trigger an explicit set if the property would have been inherited, and at that point, we get to
D) I think you would probably want something like default_volblocksize
as a property, because I would expect setting volblocksize
on filesystems to also break at least DEBUG builds of older code, and at the point where it's not inheritable by the volumes themselves, using a new property to affect defaults seems more reasonable than making the existing property "inheritable" when you can't change it, so inheritance isn't really the right mental model for it.
Does that make sense to you/fit your goal here?
I could also see an argument for a pool-wide property, but that seems janky for a number of reasons, like wanting different ones for different datasets, not being preserved in send-recv, and so on.
In case of having an additional property listed breaking old code: it would happen with any new property. So options are to either fix the old code or never ever introduce anything new... my vote is for the former.
While volblocksize
is currently immutable, is there a really good reason why it needs to stay that way?
Yes, because it would be quite complicated to implement changing volblocksize
on a volume after creation, for the same reasons changing recordsize
on a dataset doesn't affect files larger than one record already. If you'd like to go implement it in a performant way and open a PR, by all means, but it's quite involved.
Nobody was suggesting doing nothing, or that the options were do nothing or break things. And entirely new properties get ignored if they're not recognized, generally, so it's not the same at all.
I would expect parsers to only look at properties they expect for the dataset type queried, hence I would classify a parser that fails when encountering volblocksize
being returned by zfs get all
on a filesystem as being defect.
Having looked a bit into options to make volblocksize
and (while at it) recordsize
mutable for existing files:
As ZFS locates the DVA location for a requested offset in the block pointer tree (of a file/volume) by shifting with the set size of the file/volume... the so-far best idea I came up with would be to introduce an indirection for the metadata DVA pointers that allows them to point toward the old block pointer tree (still using the prior block size) for not-yet rewritten blocks - and on record size change write a whole new block pointer tree for the file/volume, that fully indirects back toward the old metadata, on writes (if needed) do R/M/W to pull in data from the old blocksize and free the old (meta +) data if no longer referenced by snapshots.
Downsides I see with this would be the need for a backwards incompatible read on-disk format change, having to take the indirection for all non-rewritten blocks on reads and the added code complexity (and the ability for new and exciting bugs to creep in) caused by the indirection.
Given all that... I lean towards a solution that would enable zfs recv
to change the recordsize
/ volblocksize
on freshly received files/volumes, which should be way easier to implement as the zfs streamdump format already delivers the data in an on-disk block-size agnostic format (offset+length within the file/volume).
I don't want to change volblocksize on existing volumes. I want a default volblocksize property for newly-created volumes.
Currently there appears to be no way to set up a default volblocksize for a pool, or a dataset, such that any volume created within the container has a designated volblocksize.
What this means is that any software which creates volumes (I'm thinking the storage driver in Qubes OS as an example) must manually specify a hard-coded volblocksize, with no input from the administrator. This isn't always possible.
(For context: the default
volblocksize
of 16K, combined with ext4 file systems atop the volumes created that way, is resulting in a guest->host write amplification of 3-4X, which is ridiculous.)Describe the feature would like to see added to OpenZFS
A way to do a
set volblocksize
on a dataset which in turn will cause any volumes created within to use thatvolblocksize
, just likerecordsize
is inherited today.How will this feature improve OpenZFS?
It will be possible to have inheritable policy for creation of volumes according to performance requirements, or global per-pool defaults.