Open problame opened 2 years ago
wouldn't this just hang forever if the system is on a ZFS root?
Secretly, zpool freeze
is already a command, albeit with big flashing "don't DO that" notes:
https://github.com/openzfs/zfs/blob/f04b97620059d08b37d9e80ada397e742bb2f311/cmd/zpool/zpool_main.c#L10958-L10974
I'm aware of zpool freeze
. As stated in the comment, that's for debugging. We should probably rename to something like zpool slog-test-freeze
or whatever to prevent misremembered commands if we add zpool suspend|resume
.
I'm aware of
zpool freeze
. As stated in the comment, that's for debugging. We should probably rename to something likezpool slog-test-freeze
or whatever to prevent misremembered commands if we addzpool suspend|resume
.
Sure, sorry, I wasn't trying to suggest it would serve here, merely remarking that the name is used, and given ZFS's strong disinterest in breaking prior expectations for things, it might be uphill to convince people to rename even such an internal thing. (Also I don't assume anyone knows it exists, after I was quite surprised when I found it in the test suite one day.)
Then again, zdb has had options renamed around a few times, so maybe nobody will blink.
The UX proposal has some typos errors at the end, it should read freezes
instead of freeze
:
zpool freezes [-p] POOL [TAG]
zpool **freeze** POOL lists all active `freeze` TAGs on POOL.
Each row contains the freeze TAG along with the first line of the --description if one was provided.
zpool **freeze** POOL TAG
- fails with an error to stderr if a freeze with name TAG does not exist on the POOL or
- succeeds and prints the --description provided on freeze to stdout.
Also: I would prefer to have zpool freezes
to only list the freezes, with a -v
to also get the full descriptions.
I don't remember my use-case right now, but can this be used like the suspension that happens when a storage device is physically yanked off the bus? (Without actually physically yanking it off the bus.)
so what should be done to allow PC to be suspended ?
every-time I have suspended I end up with a failed pool with unrecoverable error's
with raidz3 thankfully so far at least enough data still ok that it would re-silver I still do not want to end up in the situation I had with raidz2 and lost everything
all the disks show read / write / checksum errors , when it resumes , if I leave the PC on and thrash the drives for a month I have 0 errors so its not hardware issues , but every time I try suspend to RAM / hybinate , the pool falls apart
and considering the HDD's are chewing up 1.2kw and I only use the stored data 1% of the time it would be good being able to hybinate the NAS , trying to unmount / export is a pain even -force refuses to do it sometimes I just have no choice but to change fstab options and reboot and manually turn off the power to the disks and when I do need them turn disks back on and mount
not zfs on root its just a storage pool
5.18.19-051819-generic zfs-2.1.99-1389_g48cf170d5 zfs-kmod-2.1.99-1389_g48cf170d5
all the disks show read / write / checksum errors , when it resumes , if I leave the PC on and thrash the drives for a month I have 0 errors so its not hardware issues , but every time I try suspend to RAM / hybinate , the pool falls apart
My suspicion is that the suspend/hibernation kicks in in the middle of a TXG, with some data already being written to disk but the referencing metadata (including metaslabs and uberblock) not yet being persisted. Then an import would pull the (from perspective of the frozen state) out of date metaslabs, pick free space that already contains new data and 'repurpose' that... when the hibernated state is restored the TXG will continue to commit to disk, writing metadata that references data that is thought to be stable on-disk but was overwritten by the import (which is invisible to the hibernated state).
Check the logic inside your initramfs, most likely your distribution first imports the pool r/w and afterwards checks for hibernation state - which loads an in-memory state - triggering this scenario above. See https://github.com/openzfs/zfs/issues/14118#issuecomment-1303563790
so what should be done to allow PC to be suspended ?
@MasterCATZ Can you clarify, you mean "hibernated", not "suspended" as in suspend-to-RAM, right?
(This is a feature request extracted from https://github.com/openzfs/zfs/issues/260#issuecomment-982124508 )
Background
Linux supports freezing (and thawing) a mounted filesystem through an ioctl. The use case for this is to suspend all IO requests to the underlying block device. The use case for that is to enable block-device level snapshots, e.g., if the filesystem is deployed on top of a snapshot-capable volume manager. Note that freeze is not used during hibernation, contrary to what's stated in the opening comment of ZFS issue https://github.com/openzfs/zfs/issues/260. As far as I'm aware, the above is an exhaustive description of the use case for freeze & thaw.
The Linux VFS provides two mechanisms for filesystems to suspend freeze&thaw. The first is to implement the
freeze_fs
/unfreeze_fs
super block operations. The second is to implement thefreeze_super
/thaw_super
super block operations.If a filesystem implements the
_fs
type of operations, the VFS takes care locking out all VFS operations by means of a set of rwlock. Here's the kernel functionfreeze_super
that is invoked from the freeze ioctl in that case. (Don't confuse the kernel functionfreeze_super
with the->freeze_super
super block operation).If a filesystem implements the
_super
type of operations, the ioctls map more or less directly to these callbacks.However, neither of the hooks above are suitable for ZFS. The reason is that the the Linux concept of freeze&thaw expects that one super block has exclusive control of N block devices. Whereas, with ZFS, M super blocks (= ZPL datasets) share the storage of N block devices. And then there's also management operations such as
zpool scrub
andzfs recv
that perform IO and are not represented by super blocks at all.Of course, looking at how
btrfs
does it makes sense in this case. It's the mainline filesystem most similar to ZFS with regard to pooled storage (multiple blockdevs!) and multiple super blocks on top. Btrfs implementsfreeze_fs
/unfreeze_fs
. But the btrfs secret sauce is that a single btrfs filesystem (= pool in ZFS terms) only has a singlestruct suber_block
- the subvolumes (= ZPL datasets in ZFS terms) are implemented throughmount_subtree
.UX Proposal
Instead of implementing the
{freeze,unfreeze}_*
ioctls, I propose to implement two newzpool
subcommands. Here's theman
page that describes how the feature behaves towards the end user.Notes:
zpool import --discard-hibernation-state-and-fail-resume
operation that I proposed there would map tozpool import --unfreeze $MAGIC_TAG
.