Open adwarin opened 1 month ago
I don't think it is a compatibility issue with the kernel. You did exactly what you were supposed to do by checking that website before updating the kernel. The problem is more likely something to do with a script or a configuration file somewhere.
To troubleshoot this sort of issue, I would try adding rd.break
to the list of kernel parameters (I usually remove quiet
and rhgb
while I'm at it, but that is not required). When the root filesystem fails to mount, it should drop you to the dracut emergency shell and from there you should be able to run commands like zpool list
or zfs list
to inspect the storage. systemctl --failed
might tell you what startup script has failed and then systemctl status <some-service-name>
might give you further details about why it failed.
The "dataset does not exist" error sounds like maybe you renamed the root filesystem? If so, that is OK, you probably just neglected to update /etc/kernel/cmdline to match the new filesystem name before you updated to a new kernel.
I'm happy to help you troubleshoot this issue, just let me know what you find. 🙂
Edit:
My machine boots successfully if I choose an earlier version of the kernel so I was thinking that this likely means that the latest version of ZFS isn't compatible with the chosen kernel. Do you think that error is likely indicative of a ZFS/kernel compatibility issue?
I just updated my PC to kernel 6.10.9-200.fc40.x86_64 and it has booted successfully. However, I have a few non-standard startup scripts configured on my PC, so just because my system boots doesn't necessarily mean a more standard configuration would. I'll share the additional configuration that I use if it turns out that it is needed.
Edit2:
I've updated another system that I use for testing to 6.10.9-200.fc40.x86_64 and it also updated without a hitch. My test system is running a completely standard Fedora minimal installation.
I don't think I renamed my boot pool but I might be reading this wrong:
I checked Systemctl but unfortunately it doesn't seem like it has much additional data:
Interestingly an unrelated zpool shows up when I check zpool list
and I can manually import the root pool from the emergency shell.
Here is a more detailed log from the rdsosreport, I'm not super knowledgeable on ZFS but I don't see anything major failing before the root pool isn't located:
ZFS has a couple of services that attempt to find and import pools on system startup. They are zfs-import-cache.service
and zfs-import-scan.service
. The former attempts to keep track of what pools were imported at some earlier point in time and re-import those same pools on system startup. The latter imports all pools as a fallback option if /etc/zfs/zpool.cache doesn't exist or is empty. Personally, I don't like either of the default ZFS pool import strategies.
A problem with the zfs-import-cache service is that the zpool.cache file it uses can be out-of-date in a root on ZFS configuration. When running root on ZFS, the scripts have to use the zpool.cache file that is in the initramfs archive, but when that was last updated depends on when the user last updated their kernel (or ran dracut -f
).
A problem with the zfs-import-scan service is that it can import the wrong pool (e.g. if the user has other partitions on their system containing valid ZFS pools that they use, for example, to run other OS instances in virtual machines).
For these reasons, I prefer to disable (mask) zfs-import-cache.service
and override zfs-import-scan.service
with a custom import command that explicitly imports only the pool that contains my system's root filesystem (identified by its partition UUIDs).
It looks like you might be hitting one of these problems with the default ZFS import scripts. If you want to try the import strategy I prefer, here are some instructions.
zfs-import-cache.service
by running systemctl mask zfs-import-cache.service
./etc/systemd/system/zfs-import-scan.service.d/override.conf
file with contents similar to the following.# cat /etc/systemd/system/zfs-import-scan.service.d/override.conf
[Unit]
ConditionFileNotEmpty=
[Service]
ExecStart=
ExecStart=/sbin/zpool import -f -N -o cachefile=none -d /dev/disk/by-partuuid/a3c52c80-93b0-41cc-85c9-3ea0cb013503 -d /dev/disk/by-partuuid/b1426d54-b728-440b-9651-5c83f13c48e6 root $ZPOOL_IMPORT_OPTS
You will have to change the partition UUIDs in the override.conf file to match the ones on your PC. One way you should be able to find the correct UUIDs is by running zpool list -v root
.
# zpool list -v root
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
root 110G 66.5G 43.5G - - 22% 60% 1.00x ONLINE -
mirror-0 110G 66.5G 43.5G - - 22% 60.5% - ONLINE
b1426d54-b728-440b-9651-5c83f13c48e6 111G - - - - - - - ONLINE
a3c52c80-93b0-41cc-85c9-3ea0cb013503 111G - - - - - - - ONLINE
The partuuids can also be found with commands like lsblk --filter 'TYPE == "part" && FSTYPE =~ "zfs*"' -o label,name,partuuid
or ls -al /dev/disk/by-partuuid
.
Once you've masked zfs-import-cache.service
and overridden zfs-import-scan.service
, you will need to regenerate one of your initramfs images with dracut to test it. Do not use dracut's --regenerate-all
option! If you do, and there is an error in the configuration of the zfs-import-scan service, none of your boot menu options will work anymore and you will be locked out of your system. Instead, use a command like the following to regenerate only one specific initramfs image and leave the others untouched so you can fall back to them if you need to.
# dracut -f /boot/$(</etc/machine-id)/6.10.9-200.fc40.x86_64/initrd 6.10.9-200.fc40.x86_64
You can use a command like lsinitrd /boot/$(</etc/machine-id)/6.10.9-200.fc40.x86_64/initrd | grep zfs-import-scan.service.d
to verify that the initramfs image was successfully regenerated with your override script.
Let me know if this workaround resolves your problem or if you would prefer to try something else.
Okay, I ran the above and feel like we're getting closer.
I masked my zfs-import-cache.service
Then, I got my root pool partition UUIDs:
After that, I created an override file with the UUIDs from above:
# cat /etc/systemd/system/zfs-import-scan.service.d/override.conf
[Unit]
ConditionFileNotEmpty=
[Service]
ExecStart=
ExecStart=/sbin/zpool import -f -N -o cachefile=none -d /dev/disk/by-partuuid/4e48f362-2047-4d64-86ac-ff91bc295940 -d /dev/disk/by-partuuid/aff95d6c-f3d4-4ed2-a23c-71e5873a1f70 root $ZPOOL_IMPORT_OPTS
I ran the command to regen the relevant initramfs:
# dracut -f /boot/$(</etc/machine-id)/6.10.9-200.fc40.x86_64/initrd 6.10.9-200.fc40.x86_64
Unfortunately, the boot doesn't succeed but I do see the zfs-import-scan running:
I appreciate you patience and the detailed write-up!
Do you think the dracut command you supplied would work with an unmasked zfs-import-cache.service if I ran it manually after each kernel update?
Do you think the dracut command you supplied would work with an unmasked zfs-import-cache.service if I ran it manually after each kernel update?
Running the dracut command manually after kernel updates is unlikely to help. You don't want to have both that customized zfs-import-scan.service and an unmasked zfs-import-cache.service in your initramfs because they would both attempt to run and that is not how the system is designed to work (it should only attempt to import the root pool once).
Unfortunately, the boot doesn't succeed but I do see the zfs-import-scan running:
Actually, the screenshot you provided appears to show that zfs-import-scan finished, as did sysroot.mount. The last message I see is "Warning: Break before switch_root". Did you leave rd.break
set on the kernel command line? If so, remove it and I think everything should be working now.
That did it, thank you!
If you don't mind, I'd like to leave this issue open for a while in case it turns out to be a "common issue". I might need to add a little code to the installation scripts to configure the system this way by default if it turns out that people are hitting this problem.
Fedora on ZFS users: Let me know with comments to this issue report if you are hitting this problem and I need to revise the installation script to have the installed system force-import the root pool by its constituent partition UUIDs.
Hello,
First of all, I want to thank you for the work you've done here. Using your script has made the installation process for installing Fedora in a ZFS on root configuration super easy!
I'm curious about what I should look for to know if ZFS isn't compatible with a particular kernel version.
For example:
After checking here to ensure that kernel version 6.10.9-200 would be supported (6.10 kernels are indicated as being supported) by ZFS version 2.2.6, I used the kernel-update script and installed kernel version 6.10.9-200. After rebooting to test the new kernel, I'm getting error messages to the effect of "cannot open 'root/0': dataset does not exist".
My machine boots successfully if I choose an earlier version of the kernel so I was thinking that this likely means that the latest version of ZFS isn't compatible with the chosen kernel. Do you think that error is likely indicative of a ZFS/kernel compatibility issue?
Alternatively, do you know if there is an error I should be looking for which would indicate an incompatibility?
Thanks!