openzfs / openzfs-docs

OpenZFS Documentation
https://openzfs.github.io/openzfs-docs/
135 stars 194 forks source link

RHEL Root Documentation errors #397

Closed DrArclight closed 1 year ago

DrArclight commented 1 year ago

The latest version of the instructions for RHEL Root on ZFS has a few errors there are several redundant commands. The issues are noted below:

"Preparation" Section 11: the command dnf install -y gdisk dosfstools arch-install-scripts fails as the arch-install scripts is missing dependencies. The next two commands download and install the arch-install-scripts anyway. I would suggest changing the arch-install-scripts in the dnf command to cryptsetup instead as that will solve the next issue.

System Installation Section 1: This section fails to complete as cryptsetup is not installed by default in the latest Alma Linux GNOME Mini Live boot image.

"System Installation" Section 4: The following lines will install the wrong kernel version in the new /mnt partition when using the newest Alma Linux live Image due to the repositories having a newer kernel available. You have to install the same kernel as is running on the live image or there will be issues when dracut builds initramfs. dnf --installroot=/mnt \ --releasever=$VERSION_ID -y install \ @core grub2-efi-x64 \ grub2-pc-modules grub2-efi-x64-modules \ shim-x64 efibootmgr \ kernel

To solve this change to:

dnf --installroot=/mnt \ --releasever=$VERSION_ID -y install \ @core grub2-efi-x64 \ grub2-pc-modules grub2-efi-x64-modules \ shim-x64 efibootmgr \ kernel-$(uname -r)

"System Installation" Section 4: missing commands for mounting the /var, /var/log, and /var/lib partitions

Also at the end of Section 4 the command zfs create -o mountpoint=legacy rpool/alma/root should be changed to zfs create -o mountpoint=/ -o canmount=noauto rpool/alma/root as the next line mount -t zfs -o zfsutil rpool/alma/root /mnt will fail with the current version. This is also redundant as the root partition is unmounted, altered to match the recommended creation syntax, down further, then instructions are given to issue mount -t zfs rpool/alma/root /mnt Which, again, fails because it only works with mountpoint=legacy

I have not made not of the redundant commands that affect functionality, but those should probably be cleaned up as well to make the procedure more concise.

ghost commented 1 year ago

Author here. Thanks for the detailed feedback! I can confirm every item you mentioned in the post. Those redundant commands are the result of a Fedora/RHEL merge-update, which was a bit rushed. I was working on the Boot Environment appendix. I apologize for any confusion caused by the current version.

I have submitted a PR which should clean up the redundant commands and fix the problems you mentioned.

DrArclight commented 1 year ago

For what it's worth, even with the corrections, I wasn't able to get my system to boot using the latest Alma live image. Ran into a race condition where the zfs-import units of systemd would try to run before the zfs nodes were fully populated in /system/modules/zfs. Always drops to an emergency she'll on boot where I have to do a "systemctl restart zfs-import-scan.service" then exit and it will boot.

ghost commented 1 year ago

Then this problem might be specific to your computer hardware, such as slow hard drive or SAS expansion cards.

During my testing on a laptop and inside a virtual machine -- admittedly a small sample -- this issue did not occur. Maybe you can try creating a systemd unit override file with

systemctl edit zfs-import-scan.service systemctl edit --full zfs-import-scan.service

to override some parameters in zfs-import-scan.service, such as adding a delay of 5 seconds.

DrArclight @.***> writes:

For what it's worth, even with the corrections, I wasn't able to get my system to boot using the latest Alma live image. Ran into a race condition where the zfs-import units of systemd would try to run before the zfs nodes were fully populated in /system/modules/zfs. Always drops to an emergency she'll on boot where I have to do a "systemctl restart zfs-import-scan.service" then exit and it will boot.

-- Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs-docs/issues/397#issuecomment-1473077745 You are receiving this because you commented.

Message ID: @.***>

DrArclight commented 1 year ago

Did some more digging, and I was wrong (I think) about it being a race condition. When it drops to the emergency shell, doing an ls /sys/modules/zfs returns a file or folder not found error. I have to do a modporbe zfs then 'systemctl restart zfs-import-scanthenexit` to get the system to boot. I also the following lines in rdsosreport.txt:

[ 3.671197] localhost systemd[1]: Reached target Basic System. [ 3.674637] localhost systemd[1]: Import ZFS pools by cache file was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs). [ 3.674686] localhost systemd[1]: Import ZFS pools by device scanning was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs). [ 3.675597] localhost systemd[1]: Starting Set BOOTFS environment for dracut... [ 3.680096] localhost sh[806]: The ZFS modules are not loaded. [ 3.680096] localhost sh[806]: Try running '/sbin/modprobe zfs' as root to load them. [ 3.686286] localhost systemd[1]: zfs-env-bootfs.service: Deactivated successfully. [ 3.686416] localhost systemd[1]: Finished Set BOOTFS environment for dracut. [ 3.689865] localhost systemd[1]: Reached target ZFS pool import target. [ 3.693875] localhost systemd[1]: Starting dracut pre-mount hook... [ 3.717708] localhost dracut-pre-mount[827]: The ZFS modules are not loaded. [ 3.717708] localhost dracut-pre-mount[827]: Try running '/sbin/modprobe zfs' as root to load them. [ 3.728865] localhost systemd[1]: Finished dracut pre-mount hook. [ 3.731285] localhost systemd[1]: Mounting /sysroot... [ 3.731324] localhost mount[829]: The ZFS modules are not loaded. [ 3.731324] localhost mount[829]: Try running '/sbin/modprobe zfs' as root to load them. [ 3.731433] localhost systemd[1]: Snapshot bootfs just before it is mounted was skipped because of a failed condition check (ConditionKernelCommandLine=bootfs.snapshot). [ 3.731471] localhost systemd[1]: Rollback bootfs just before it is mounted was skipped because of a failed condition check (ConditionKernelCommandLine=bootfs.rollback).

So it looks like the zfs modules are not loading at all before it drops to the emergency shell.

As configured right now, I'm just trying this with a pair of Samsung 980 1Tb NVME drives, so there shouldn't be anything delaying the initialization. Add a delay to the unit files through systemd edit doesn't seem to help, because the zfs modules never load before it drops to the emergency shell, so the units fail to even execute the prestart delay.

ghost commented 1 year ago

Therefore there seems to be an issue with the zfs module not loading.

On an installed system, you can try this:

echo 'forced_drivers+=" zfs "' >> /etc/dracut.conf.d/zfs.conf

Or, during installation:

echo 'forced_drivers+=" zfs "' >> /mnt/etc/dracut.conf.d/zfs.conf

See if this fixes the issue.

ghost commented 1 year ago

Then rebuild initrd as described in the guide. I forgot to mention.

DrArclight commented 1 year ago

Ok, so I finally found a way to get my system to boot reliably. I had to do two things. The first was to add rd.driver.pre=zfs to the kernel parameters by grubby -update-kernel=ALL --args="rd.driver.pre=zfs" The second was I had to add an ExecStartPre=/bin/sleep 20 to /usr/lib/systemd/system/zfs-import-scan.service then have dracut rebuild the initramfs.

Without having dracut rebuild the initramfs the system was not using the version of the systemd unit with the delay during boot, so even though the zfs modules were being loaded, I was hitting the udev race condition.

ghost commented 1 year ago

You most likely should not use grubby for adding kernel command line options, because it is not aware of the mirrored EFI system partitions we set up in the guide and it is likely that the GRUB setup on your computer is now no longer redundant.

Do it the right way:

echo 'GRUB_CMDLINE_LINUX_DEFAULT="rd.driver.pre=zfs"' >> /etc/default/grub

Then follow the guide to reinstall GRUB on both disks, generate GRUB menu and mirror the boot files to both disks.

DrArclight @.***> writes:

Ok, so I finally found a way to get my system to boot reliably. I had to do two things. The first was to add rd.driver.pre=zfs to the kernel parameters by grubby -update-kernel=ALL --args="rd.driver.pre=zfs" The second was I had to add an ExecStartPre=/bin/sleep 20 to /usr/lib/systemd/system/zfs-import-scan.service then have dracut rebuild the initramfs.

Without having dracut rebuild the initramfs the system was not using the version of the systemd unit with the delay during boot, so even though the zfs modules were being loaded, I was hitting the udev race condition.

-- Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs-docs/issues/397#issuecomment-1476640200 You are receiving this because you commented.

Message ID: @.***>

DrArclight commented 1 year ago

Noted. Thanks for the help.