Closed DrArclight closed 1 year ago
Author here. Thanks for the detailed feedback! I can confirm every item you mentioned in the post. Those redundant commands are the result of a Fedora/RHEL merge-update, which was a bit rushed. I was working on the Boot Environment appendix. I apologize for any confusion caused by the current version.
I have submitted a PR which should clean up the redundant commands and fix the problems you mentioned.
For what it's worth, even with the corrections, I wasn't able to get my system to boot using the latest Alma live image. Ran into a race condition where the zfs-import units of systemd would try to run before the zfs nodes were fully populated in /system/modules/zfs. Always drops to an emergency she'll on boot where I have to do a "systemctl restart zfs-import-scan.service" then exit and it will boot.
Then this problem might be specific to your computer hardware, such as slow hard drive or SAS expansion cards.
During my testing on a laptop and inside a virtual machine -- admittedly a small sample -- this issue did not occur. Maybe you can try creating a systemd unit override file with
systemctl edit zfs-import-scan.service systemctl edit --full zfs-import-scan.service
to override some parameters in zfs-import-scan.service, such as adding a delay of 5 seconds.
DrArclight @.***> writes:
For what it's worth, even with the corrections, I wasn't able to get my system to boot using the latest Alma live image. Ran into a race condition where the zfs-import units of systemd would try to run before the zfs nodes were fully populated in /system/modules/zfs. Always drops to an emergency she'll on boot where I have to do a "systemctl restart zfs-import-scan.service" then exit and it will boot.
-- Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs-docs/issues/397#issuecomment-1473077745 You are receiving this because you commented.
Message ID: @.***>
Did some more digging, and I was wrong (I think) about it being a race condition. When it drops to the emergency shell, doing an ls /sys/modules/zfs
returns a file or folder not found error. I have to do a modporbe zfs
then 'systemctl restart zfs-import-scanthen
exit` to get the system to boot. I also the following lines in rdsosreport.txt:
[ 3.671197] localhost systemd[1]: Reached target Basic System. [ 3.674637] localhost systemd[1]: Import ZFS pools by cache file was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs). [ 3.674686] localhost systemd[1]: Import ZFS pools by device scanning was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs). [ 3.675597] localhost systemd[1]: Starting Set BOOTFS environment for dracut... [ 3.680096] localhost sh[806]: The ZFS modules are not loaded. [ 3.680096] localhost sh[806]: Try running '/sbin/modprobe zfs' as root to load them. [ 3.686286] localhost systemd[1]: zfs-env-bootfs.service: Deactivated successfully. [ 3.686416] localhost systemd[1]: Finished Set BOOTFS environment for dracut. [ 3.689865] localhost systemd[1]: Reached target ZFS pool import target. [ 3.693875] localhost systemd[1]: Starting dracut pre-mount hook... [ 3.717708] localhost dracut-pre-mount[827]: The ZFS modules are not loaded. [ 3.717708] localhost dracut-pre-mount[827]: Try running '/sbin/modprobe zfs' as root to load them. [ 3.728865] localhost systemd[1]: Finished dracut pre-mount hook. [ 3.731285] localhost systemd[1]: Mounting /sysroot... [ 3.731324] localhost mount[829]: The ZFS modules are not loaded. [ 3.731324] localhost mount[829]: Try running '/sbin/modprobe zfs' as root to load them. [ 3.731433] localhost systemd[1]: Snapshot bootfs just before it is mounted was skipped because of a failed condition check (ConditionKernelCommandLine=bootfs.snapshot). [ 3.731471] localhost systemd[1]: Rollback bootfs just before it is mounted was skipped because of a failed condition check (ConditionKernelCommandLine=bootfs.rollback).
So it looks like the zfs modules are not loading at all before it drops to the emergency shell.
As configured right now, I'm just trying this with a pair of Samsung 980 1Tb NVME drives, so there shouldn't be anything delaying the initialization. Add a delay to the unit files through systemd edit doesn't seem to help, because the zfs modules never load before it drops to the emergency shell, so the units fail to even execute the prestart delay.
Therefore there seems to be an issue with the zfs module not loading.
On an installed system, you can try this:
echo 'forced_drivers+=" zfs "' >> /etc/dracut.conf.d/zfs.conf
Or, during installation:
echo 'forced_drivers+=" zfs "' >> /mnt/etc/dracut.conf.d/zfs.conf
See if this fixes the issue.
Ok, so I finally found a way to get my system to boot reliably. I had to do two things. The first was to add rd.driver.pre=zfs
to the kernel parameters by grubby -update-kernel=ALL --args="rd.driver.pre=zfs"
The second was I had to add an ExecStartPre=/bin/sleep 20
to /usr/lib/systemd/system/zfs-import-scan.service
then have dracut rebuild the initramfs.
Without having dracut rebuild the initramfs the system was not using the version of the systemd unit with the delay during boot, so even though the zfs modules were being loaded, I was hitting the udev race condition.
You most likely should not use grubby for adding kernel command line options, because it is not aware of the mirrored EFI system partitions we set up in the guide and it is likely that the GRUB setup on your computer is now no longer redundant.
Do it the right way:
echo 'GRUB_CMDLINE_LINUX_DEFAULT="rd.driver.pre=zfs"' >> /etc/default/grub
Then follow the guide to reinstall GRUB on both disks, generate GRUB menu and mirror the boot files to both disks.
DrArclight @.***> writes:
Ok, so I finally found a way to get my system to boot reliably. I had to do two things. The first was to add
rd.driver.pre=zfs
to the kernel parameters bygrubby -update-kernel=ALL --args="rd.driver.pre=zfs"
The second was I had to add anExecStartPre=/bin/sleep 20
to/usr/lib/systemd/system/zfs-import-scan.service
then have dracut rebuild the initramfs.Without having dracut rebuild the initramfs the system was not using the version of the systemd unit with the delay during boot, so even though the zfs modules were being loaded, I was hitting the udev race condition.
-- Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs-docs/issues/397#issuecomment-1476640200 You are receiving this because you commented.
Message ID: @.***>
Noted. Thanks for the help.
The latest version of the instructions for RHEL Root on ZFS has a few errors there are several redundant commands. The issues are noted below:
"Preparation" Section 11: the command
dnf install -y gdisk dosfstools arch-install-scripts
fails as the arch-install scripts is missing dependencies. The next two commands download and install the arch-install-scripts anyway. I would suggest changing the arch-install-scripts in the dnf command to cryptsetup instead as that will solve the next issue.System Installation Section 1: This section fails to complete as cryptsetup is not installed by default in the latest Alma Linux GNOME Mini Live boot image.
"System Installation" Section 4: The following lines will install the wrong kernel version in the new /mnt partition when using the newest Alma Linux live Image due to the repositories having a newer kernel available. You have to install the same kernel as is running on the live image or there will be issues when dracut builds initramfs.
dnf --installroot=/mnt \ --releasever=$VERSION_ID -y install \ @core grub2-efi-x64 \ grub2-pc-modules grub2-efi-x64-modules \ shim-x64 efibootmgr \ kernel
To solve this change to:
dnf --installroot=/mnt \ --releasever=$VERSION_ID -y install \ @core grub2-efi-x64 \ grub2-pc-modules grub2-efi-x64-modules \ shim-x64 efibootmgr \ kernel-$(uname -r)
"System Installation" Section 4: missing commands for mounting the /var, /var/log, and /var/lib partitions
Also at the end of Section 4 the command
zfs create -o mountpoint=legacy rpool/alma/root
should be changed tozfs create -o mountpoint=/ -o canmount=noauto rpool/alma/root
as the next linemount -t zfs -o zfsutil rpool/alma/root /mnt
will fail with the current version. This is also redundant as the root partition is unmounted, altered to match the recommended creation syntax, down further, then instructions are given to issuemount -t zfs rpool/alma/root /mnt
Which, again, fails because it only works with mountpoint=legacyI have not made not of the redundant commands that affect functionality, but those should probably be cleaned up as well to make the procedure more concise.