lxc / incus

Powerful system container and virtual machine manager
https://linuxcontainers.org/incus
Apache License 2.0
2.8k stars 224 forks source link

zfs: load keys for encrypted datasets during pool import #1384

Closed cyphar closed 4 days ago

cyphar commented 1 week ago

If a user has set up their own zpools and given them to us to manage, it's possible they've configured ZFS-native encryption. For the most part, this works completely transparently to us. However, because we manually do zpool-import and zpool-export during startup and shutdown of Incus, ZFS datasets with keys will have their keys unloaded during shutdown and then the keys are not automatically loaded on startup. This results in containers being unable to start on startup because all IOs are blocked indefinitely until the dataset keys are loaded manually by the admin -- even if the admin has configured automatic key loading on their system!

The simplest solution would be to pass -l to zfs-import (which causes ZFS to auto-import all keys for all datasets in the pool). However, it is slightly nicer to do a separate zfs-load-key so that we can unmount the pool if the key import fails (zfs-import will leave the pool imported but without keys loaded).

If the user has configured keylocation=prompt (or otherwise misconfigured the encryption settings for their pool), the command will fail and the pool import will fail loudly (rather than silently failing in the form of an imported pool that is not usable as a filesystem).

Signed-off-by: Aleksa Sarai cyphar@cyphar.com

cyphar commented 1 week ago

I haven't added tests yet because I'm not sure how you would like me to do that -- should we have just one specific test for this behaviour or run all of the ZFS-related tests on both an encrypted and unencrypted dataset?

Also let me know if you think just doing zpool import -l is good enough. Every time I update Incus (or restart my server) I seem to hit this particular issue. I haven't tested this on my machine yet, just posting this to get a review of the general idea and whether you'd like a different solution.

stgraber commented 1 week ago

zpool -l may be easier for this, assuming it behaves correctly with /dev/null as stdin so it doesn't actually end up blocking and instead fails the import.

cyphar commented 1 week ago

@stgraber Ah, so you would prefer if the import failed? I assumed you'd prefer a warning (hence all of the logic for figuring out which datasets have non-file:// keylocations). That would make this patch much simpler. Let me test it out...

stgraber commented 1 week ago

A storage pool failing to start should have it go into a failed state that gets retried so we should be okay with it failing to mount.

cyphar commented 5 days ago

@stgraber Okay, zpool import -l will return an error if stdin is /dev/null and we have keylocation=prompt. However, the pool will still be imported (just without the keys loaded), so I guess we should export the pool in that case? Or should we leave it in the half-imported state?

EDIT: I think doing load-key separately (but without any of the extra handling) would be nicer because we could do d.Unmount in the error case more cleanly.