openzfs / openzfs-docs

OpenZFS Documentation
https://openzfs.github.io/openzfs-docs/
133 stars 194 forks source link

RHEL 8.5 Root on ZFS unable to boot #282

Closed mdiepart closed 2 years ago

mdiepart commented 2 years ago

Hello, With the hope that there is nothing wrong on my side specifically, I think there is an issue with Root on ZFS for EL 8.5. I followed the guide to the letter. I Did it several times and even tried tweaking the installation procedure, however when I finish the installation and try to boot on the new system, it will hang at the boot screen.

My setup:

Server Dell PowerEdge T630 (all firmwares up-to-date) CPU: 2x Intel Xeon E5-2630 v3 Storage controller: Dell HBA330 (flashed with latest firmware) Disks: 2x Crucial MX500 SSD set in mirror.

Installation:

Tried to install Rocky Linux 8.5 using the Rocky-8.5-Workstation-20211114 iso flashed on a 64GB USB key. Machine configured to boot in UEFI mode. I boot on the USB key. The media tests Ok for integrity. I open a terminal window and configure the keyboard for my locale (BE). I set the root password and restart sshd then follow the manual from an SSH session with putty.

I follow the installation instructions as listed (I do not use any kind of encryption). I tried the installation with the kABI modules, both with the stable and testing branches. I also tried by re-building the modules (as listed in the kabi-tracking-kmod section. I did that after having installed the zfs packages on the live USB media and before loading the module (with modprobe). I also did that after entering the chroot of the new install and before generating the initrd.

Each and every one of those trials ended up with what seems to be the same error. The boot hangs.

possible errors during the installation procedure:

When following the installation procedure, I do encounter three errors that are not mentioned in the guide.

In Part "System installation", step 9.

Error in POSTIN scriptlet in rpm package zfs-release

* It also complains somewhere else (same command as the previous complaints):

Running scriptlet: kernel-core-4.18.0-348.20.1.el8_5.x86_64 441/441 /usr/lib/kernel/install.d/50-dracut.install: line 39: /proc/cmdline: No such file or directory dracut: No '/dev/log' or 'logger' included for syslog logging libkmod: kmod_module_new_from_loaded: could not open /proc/modules: No such file or directory dracut-install: Could not get list of loaded modules: Unknown error -2. Switching to non-hostonly mode. grep: /proc/cpuinfo: No such file or directory grep: /proc/cpuinfo: No such file or directory grep: /proc/cpuinfo: No such file or directory grep: /proc/cpuinfo: No such file or directory dracut: Turning off host-only mode: '/sys' is not mounted! dracut: Turning off host-only mode: '/proc' is not mounted! dracut: Turning off host-only mode: '/run' is not mounted! dracut: Turning off host-only mode: '/dev' is not mounted! /bin/sed: can't read /proc/cpuinfo: No such file or directory (18 times in total) /usr/lib/kernel/install.d/51-dracut-rescue.install: line 51: /proc/cmdline: No such file or directory dracut: No '/dev/log' or 'logger' included for syslog logging /bin/sed: can't read /proc/cpuinfo: No such file or directory (18 times in total)


### In Part "Bootloader" section "Install GRUB" step 2 (install initrd)
I have `dracut: /lib/modules/4.18.0-348.2.1.el8_5.x86_64//modules.dep is missing. Did you run depmod?`

## Debug information
When the boot hangs, this is what the screen displays: 
![image](https://user-images.githubusercontent.com/3754615/158352049-03697989-cf79-433f-9e18-5ff7a773e436.png)

I then edited the boot command in grub before booting to display more debug information by appending `nosplash debug --verbose` to the `linux rhel/BOOT/default@[...]` line.
The following screenshots shows the messages that are flooding the screen until I reboot the machine:
![image](https://user-images.githubusercontent.com/3754615/158352968-419ad515-fb86-4e31-b9ea-1f27b521ff29.png)

At this point I am not sure if there is any additional information that I could add. But feel free to ask if you need something else or if I forgot to mention something.

Thank you very much,
Morgan
ghost commented 2 years ago

Thanks for reporting this issue. I'm the author of the guide and have experience with both Dell T20 server and Dell R720xd rack-mount server, with the latter comparable to your T630 (RAID controller, iDRAC, etc.)

Do you mind posting the value of DISK= variable during installation? It should be something like this:DISK='/dev/disk/by-id/ata-FOO /dev/disk/by-id/nvme-BAR'.

The problems mostly arise from the RAID controller:

As for the other errors occurred during installation, but before reboot: they are harmless and may be safely ignored.

ghost commented 2 years ago

Sorry, it appears that HBA330 is in IT mode already. Just post the DISK line should be fine, I think.

ghost commented 2 years ago

The error also appears to be occurring when mounting root file system. You can try to mount it yourself then see the exact failure point with the following kernel cmdline:

rd.break=pre-mount rd.debug

Man page have some other relevant information and options.

mdiepart commented 2 years ago

Hi! Thank you for your reply. The disk variable was assigned as such: DISK='/dev/disk/by-id/ata-CT500MX500SSD1_2141E5D968BD /dev/disk/by-id/ata-CT500MX500SSD1_2141E5D96BFF'.

I have since then tried the installation with Debian Bullseye and everything worked as expected on the first try, I do not know if it does bring anything interesting to the report.

I will try your suggestion soon, however I am not sure of how I should follow your instructions. How can I mount it myself? Via live usb?

ghost commented 2 years ago

I have since then tried the installation with Debian Bullseye and everything worked as expected on the first try.

Thanks for the report re: Debian. This is very important because the possibility of a misconfigured RAID card is eliminated. We can then put our focus on the guide itself.

The problem is specific to your hardware, though: I did an installation just now, copying the Rocky guide verbatim, and everything is working as expected. This can also be demonstrated by the fact that this is the first error report I received so far, out of ~90 attempted installations. This is also good so that we can focus exclusively on the hardware-related parts in the guide.

ghost commented 2 years ago

We now need to know exactly what went wrong during boot on your machine. To do this, we can append the following two parameters in GRUB menu, at the first reboot:

rd.debug loglevel=999

The result would look like this:

root=ZFS=rpool_eh87xt/rhel/ROOT/default ro rd.debug loglevel=999

And I obtained the following output on my fresh installation during boot:

[    7.746184] ata6: SATA link down (SStatus 0 SControl 300)
[    7.748350] ata5: SATA link down (SStatus 0 SControl 300)
[    7.750682] ata4: SATA link down (SStatus 0 SControl 300)
[    7.752895] ata3: SATA link down (SStatus 0 SControl 300)
[    7.754986] ata2: SATA link down (SStatus 0 SControl 300)
[    7.757157] ata1: SATA link down (SStatus 0 SControl 300)
[    7.887355] ZFS: Loaded module v2.0.7-1, ZFS pool version 5000, ZFS filesystem version 5
#milestone: ZFS kernel module loaded
[  OK  ] Started udev Wait for Complete Device Initialization.
         Starting Import ZFS pools by device scanning...
[  OK  ] Reached target System Initialization.
[  OK  ] Reached target Basic System.
[  OK  ] Started Import ZFS pools by device scanning.
         Starting Set BOOTFS environment for dracut...
[  OK  ] Started Set BOOTFS environment for dracut.
[  OK  ] Reached target ZFS pool import target.
         Starting dracut pre-mount hook...
[   18.618476] dracut-pre-mount[745]: //lib/dracut-lib.sh@434(source): hookdir=/lib/dracut/hooks
[   18.618616] dracut-pre-mount[745]: //lib/dracut-lib.sh@435(source): export hookdir
[   18.623935] dracut-pre-mount[745]: //lib/dracut-lib.sh@564(source): command -v findmnt
[   18.627099] dracut-pre-mount[745]: //lib/dracut-lib.sh@1050(source): command -v pidof
[   18.641142] dracut-pre-mount[745]: //lib/dracut-lib.sh@1227(source): setmemdebug
[   18.646691] dracut-pre-mount[745]: //lib/dracut-lib.sh@1222(setmemdebug): '[' -z 0 ']'
[   18.651585] dracut-pre-mount[745]: /bin/dracut-pre-mount@9(main): source_conf /etc/conf.d
[   18.651840] dracut-pre-mount[745]: /lib/dracut-lib.sh@453(source_conf): local f
[   18.651916] dracut-pre-mount[745]: /lib/dracut-lib.sh@454(source_conf): '[' /etc/conf.d ']'
[   18.651983] dracut-pre-mount[745]: /lib/dracut-lib.sh@454(source_conf): '[' -d //etc/conf.d ']'
[   18.652089] dracut-pre-mount[745]: /lib/dracut-lib.sh@455(source_conf): for f in "/$1"/*.conf
[   18.652160] dracut-pre-mount[745]: /lib/dracut-lib.sh@455(source_conf): '[' -e //etc/conf.d/systemd.conf ']'
[   18.652229] dracut-pre-mount[745]: /lib/dracut-lib.sh@455(source_conf): . //etc/conf.d/systemd.conf
[   18.652293] dracut-pre-mount[745]: ///etc/conf.d/systemd.conf@1(source): systemdutildir=/usr/lib/systemd
[   18.652362] dracut-pre-mount[745]: ///etc/conf.d/systemd.conf@2(source): systemdsystemunitdir=/usr/lib/systemd/system
[   18.652426] dracut-pre-mount[745]: ///etc/conf.d/systemd.conf@3(source): systemdsystemconfdir=/etc/systemd/system
[   18.652506] dracut-pre-mount[745]: /bin/dracut-pre-mount@11(main): make_trace_mem 'hook pre-mount' 1:shortmem 2+:mem 3+:slab
[   18.652569] dracut-pre-mount[745]: /lib/dracut-lib.sh@1232(make_trace_mem): local log_level prefix msg msg_printed
[   18.652645] dracut-pre-mount[745]: /lib/dracut-lib.sh@1233(make_trace_mem): local trace trace_level trace_in_higher_levels insert_trace
[   18.652717] dracut-pre-mount[745]: /lib/dracut-lib.sh@1235(make_trace_mem): msg='hook pre-mount'
[   18.652798] dracut-pre-mount[745]: /lib/dracut-lib.sh@1236(make_trace_mem): shift
[   18.652873] dracut-pre-mount[745]: /lib/dracut-lib.sh@1238(make_trace_mem): prefix='[debug_mem]'
[   18.652940] dracut-pre-mount[745]: /lib/dracut-lib.sh@1239(make_trace_mem): log_level=0
[   18.653066] dracut-pre-mount[745]: /lib/dracut-lib.sh@1241(make_trace_mem): '[' -z 0 ']'
[   18.653139] dracut-pre-mount[745]: /lib/dracut-lib.sh@1241(make_trace_mem): '[' 0 -le 0 ']'
[   18.653205] dracut-pre-mount[745]: /lib/dracut-lib.sh@1242(make_trace_mem): return
[   18.653272] dracut-pre-mount[745]: /bin/dracut-pre-mount@14(main): getarg rd.break=pre-mount rdbreak=pre-mount
[   18.653339] dracut-pre-mount[745]: /lib/dracut-lib.sh@202(getarg): debug_off
[   18.653405] dracut-pre-mount[745]: /lib/dracut-lib.sh@18(debug_off): set +x
[   18.694867] dracut-pre-mount[745]: /lib/dracut-lib.sh@242(getarg): return 1
[   18.701741] dracut-pre-mount[745]: /bin/dracut-pre-mount@15(main): source_hook pre-mount
[   18.704440] dracut-pre-mount[745]: /lib/dracut-lib.sh@438(source_hook): local _dir
[   18.705782] dracut-pre-mount[745]: /lib/dracut-lib.sh@439(source_hook): _dir=pre-mount
[   18.705875] dracut-pre-mount[745]: /lib/dracut-lib.sh@439(source_hook): shift
[   18.705942] dracut-pre-mount[745]: /lib/dracut-lib.sh@440(source_hook): source_all /lib/dracut/hooks/pre-mount
[   18.706025] dracut-pre-mount[745]: /lib/dracut-lib.sh@427(source_all): local f
[   18.706092] dracut-pre-mount[745]: /lib/dracut-lib.sh@428(source_all): local _dir
[   18.706154] dracut-pre-mount[745]: /lib/dracut-lib.sh@429(source_all): _dir=/lib/dracut/hooks/pre-mount
[   18.779554] hrtimer: interrupt took 3266066 ns-lib.sh@429(source_all): shift

[   18.706287] dracut-pre-mount[745]: /lib/dracut-lib.sh@430(source_all): '[' /lib/dracut/hooks/pre-mount ']'
[   18.706351] dracut-pre-mount[745]: /lib/dracut-lib.sh@430(source_all): '[' -d //lib/dracut/hooks/pre-mount ']'
[   18.706432] dracut-pre-mount[745]: /lib/dracut-lib.sh@431(source_all): for f in "/$_dir"/*.sh
[   18.706501] dracut-pre-mount[745]: /lib/dracut-lib.sh@431(source_all): '[' -e //lib/dracut/hooks/pre-mount/90-zfs-load-key.sh ']'
[   18.706565] dracut-pre-mount[745]: /lib/dracut-lib.sh@431(source_all): . //lib/dracut/hooks/pre-mount/90-zfs-load-key.sh
[   18.706639] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@4(source): '[' -e /bin/systemctl ']'
[   18.706705] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@9(source): '[' -f /lib/dracut-lib.sh ']'
[   18.706768] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@9(source): dracutlib=/lib/dracut-lib.sh
[   18.706831] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@10(source): '[' -f /usr/lib/dracut/modules.d/99base/dracut-lib.sh ']'
[   18.706893] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@12(source): . /lib/dracut-lib.sh
[   18.706963] dracut-pre-mount[745]: ///lib/dracut-lib.sh@3(source): export DRACUT_SYSTEMD
[   18.707045] dracut-pre-mount[745]: ///lib/dracut-lib.sh@4(source): export NEWROOT
[   18.707110] dracut-pre-mount[745]: ///lib/dracut-lib.sh@5(source): '[' -n /sysroot ']'
[   18.707173] dracut-pre-mount[745]: ///lib/dracut-lib.sh@6(source): '[' -d /sysroot ']'
[   18.707234] dracut-pre-mount[745]: ///lib/dracut-lib.sh@9(source): '[' -d /run/initramfs ']'
[   18.707296] dracut-pre-mount[745]: ///lib/dracut-lib.sh@14(source): '[' -d /run/lock ']'
[   18.707357] dracut-pre-mount[745]: ///lib/dracut-lib.sh@15(source): '[' -d /run/log ']'
[   18.707424] dracut-pre-mount[745]: ///lib/dracut-lib.sh@61(source): '[' -z 1 ']'
[   18.707507] dracut-pre-mount[745]: ///lib/dracut-lib.sh@424(source): setdebug
[   18.707598] dracut-pre-mount[745]: ///lib/dracut-lib.sh@409(setdebug): '[' -f /usr/lib/initrd-release ']'
[   18.707701] dracut-pre-mount[745]: ///lib/dracut-lib.sh@410(setdebug): '[' -z yes ']'
[   18.707766] dracut-pre-mount[745]: ///lib/dracut-lib.sh@421(setdebug): debug_on
[   18.707831] dracut-pre-mount[745]: ///lib/dracut-lib.sh@22(debug_on): '[' yes = yes ']'
[   18.707936] dracut-pre-mount[745]: ///lib/dracut-lib.sh@22(debug_on): set -x
[   18.708041] dracut-pre-mount[745]: ///lib/dracut-lib.sh@434(source): hookdir=/lib/dracut/hooks
[   18.708114] dracut-pre-mount[745]: ///lib/dracut-lib.sh@435(source): export hookdir
[   18.708834] dracut-pre-mount[745]: ///lib/dracut-lib.sh@564(source): command -v findmnt
[   18.715390] dracut-pre-mount[745]: ///lib/dracut-lib.sh@1050(source): command -v pidof
[   18.732932] dracut-pre-mount[745]: ///lib/dracut-lib.sh@1227(source): setmemdebug
[   18.733342] dracut-pre-mount[745]: ///lib/dracut-lib.sh@1222(setmemdebug): '[' -z 0 ']'
[   18.734237] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@15(source): '[' -z zfs:rpool_eh87xt/rhel/ROOT/default ']'
[   18.738497] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@17(source): '[' rpool_eh87xt/rhel/ROOT/default = zfs:rpool_eh87xt/rhel/ROOT/default ']'
[   18.741946] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@20(source): true
[   18.742985] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@21(source): grep -q -v '^$'
[   18.746969] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@21(source): zpool list -H
[   18.799306] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@21(source): break
[   18.800725] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@29(source): '[' zfs:rpool_eh87xt/rhel/ROOT/default = zfs:AUTO ']'
[   18.804400] dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@32(source): BOOTFS=rpool_[   19.176741] zfs-generator: starting
eh87xt/rhel/ROOT/default
[   18.804970] dracut-pre-mount[[   19.191848] zfs-generator: loading Dracut library from /lib/dracut-lib.sh
745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@33(source): BOOTFS=rpool_eh87xt/rhel/ROOT/default
[   18.814049] dracut-pre-mount[745]: /////lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@37(source): awk -F/ '{print $1}'
[   18.836401] dracut-pre-mount[745]: /////lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@37(source): echo rpool_eh87xt/rhel/ROOT/default
[   18.840668] dracut-pre-mount[745]: ////lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@37(source): zpool list -H -o feature@encryption rpool_eh87xt
[   18.852193] [  OK  ] Started dracut pre-mount hook.
dracut-pre-mount[745]: ///lib/dracut/hooks/pre-mount/90-zfs-load-key.sh@37(source): '[' enabled [   19.267426] zfs-generator: writing extension for sysroot.mount to /run/systemd/generator/sysroot.mount.d/zfs-enhancement.conf
= active ']'
[   18.882995] dracut-pre-mount[745]: /bin/dracut-pre-mount@17(main): export -p
[   18.893135] dracut-pre-mount[745]: /bin/dracut-pre-mount@19(main): exit 0
         Mounting /sysroot...
[  OK  ] Mounted /sysroot.
[  OK  ] Reached target Initrd Root File System.
         Starting Reload Configuration from the [   19.303270] zfs-generator: finished
Real Root...
[  OK  ] Started Reload Configuration from the Real Root.
         Starting dracut mount hook...
[  OK  ] Reached target Initrd File Systems.
[  OK  ] Reached target Initrd Default Target.
#dracut-pre-mount ends here 
ghost commented 2 years ago

So my suggestion is to perform a vanilla installation as described in the guide, with no modifications, and add rd.debug loglevel=999 in GRUB on first reboot, then see why dracut-pre-mount goes into a loop.

mdiepart commented 2 years ago

Here are the results from the boot with the rd.debug loglevel=999 option. As there are a lot going on I choose to record my screen. I uploaded the file on youtube, I hope it is not an issue for you otherwise I can try differently.

https://youtu.be/5AfRVcEcNSQ

ghost commented 2 years ago

Thanks. It is now clear that the loop is caused by not ZFS itself, but a buggy dracut script shipped in the zfs-dracut package in version 2.0.7, namely this while loop. Curiously, this script is updated in newer releases, presumably containing a fix for this.

mdiepart commented 2 years ago

I also tried to boot to a initramfs shell. I never did such a thing but as I understood I only need to add break=premount (or modules or top) at the same place I inserted the rd.debug however the shell never appeared.

Given that it seems to be a broken zfs-dracut, is there anything else that can be done?

ghost commented 2 years ago

My previous comment is wrong, apologies. The issue lies with why there is no output from zpool list -H: if there is any output, the loop will break and continue normal booting.

To diagnose this issue, attach rd.break=pre-mount to kernel command line at GRUB menu, and you will be dropped to a shell. Check the following:

modprobe zfs

zpool list #any pool listed?

If bpool and rpool are listed, you can type exit to continue normal booting.

If modprobe zfs succeeds, but zpool import returns nothing, then see if your mirrored disks are in /dev/disk/by-id.

ghost commented 2 years ago

I only need to add break=premount It must be rd.break=pre-mount. This is dracut specific and nothing else would work.

mdiepart commented 2 years ago

I was now able to execute an initramfs shell. zpool list returned nothing however I was able to import both pools. After quitting it continued booting correctly. It then rebooted (which I guess was because the filesystem was re-labeled). I had to do the exact same procedure a second time and I am now presented with the login prompt for the system.

ghost commented 2 years ago

Congratulations! Got in to the system at last.

We now need to find out why it was not imported automatically. The reason should be viewable in journalctl, around the time when zfs kernel was loaded.

ghost commented 2 years ago

Hint: in your video the service responsible for import zfs-import-scan.service has a status of "inactive". On my machine it has a status of "active". You might want to enable it with systemctl enable zfs-import-scan.service and see its log with journalctl -u zfs-import-scan.service.

mdiepart commented 2 years ago

I had a look at journalctl but was not able to make much sense of it, so here is the full journal (https://paste.centos.org/view/37464338) from the first successful boot. journalctl -u zfs-import-scan.service returned

mar 17 18:01:36 websdr systemd[1]: Starting Import ZFS pools by device scanning...
mar 17 18:01:36 websdr zpool[2457]: no pools available to import
mar 17 18:01:36 websdr systemd[1]: Started Import ZFS pools by device scanning.

.

EDIT: I manually imported the pool around 18:00:43, near line 1453.

mdiepart commented 2 years ago

The system will still not boot on its own. Here are some log taken from inside initramfs shell that may help understand why the system does not import disks? image

mdiepart commented 2 years ago

It complains that /sys/module/zfs is not a drectory but I checked manually (still in intramfs shell) and this is indeed a directory with files inside. I do not understand why it complains. Starting the service manually succeeded, boot resumed.

ghost commented 2 years ago

The latest screenshot says that initrd (RedHat's way of saying initramfs, initial RAM disk) did not even attempt to import the pools because the zfs module was not loaded.

There is an option in dracut to force load the kernel module: force_drivers+=" zfs ". This option "ensured that the drivers are tried to be loaded early via modprobe" (man page).

You can try this with echo 'force_drivers+=" zfs "' >>/etc/dracut.conf.d/zfs.conf then rebuild initrd with

for directory in /lib/modules/*; do
  kernel_version=$(basename $directory)
  dracut --force --kver $kernel_version
done

Then reboot.

ghost commented 2 years ago

It complains that /sys/module/zfs is not a drectory but I checked manually (still in intramfs shell) and this is indeed a directory with files inside.

What happened is this: the zfs module was not loaded (hence not directory) when the import service was first run automatically by the system.

However, for some reason (most likely related to the HBA, see here), the zfs module was finally loaded, with a delay, when you are typing in the shell. With the module loaded, the directory is now populated and the import service now runs as intended.

mdiepart commented 2 years ago

In the meantime, I tried using rd.break=pre-trigger. At that time, the said folder did not exist yet and doing modprobe zfs created the folder. However, once I quit the initramfs shell, the boot hangs as initially. I will skip your above suggestion and directly try the solution suggested in the link you provided.

ghost commented 2 years ago

solution suggested in the link

I would have suggested that, too, if it works. :-)

Unfortunately that option ZFS_INITRD_PRE_MOUNTROOT_SLEEP is for "initramfs", the tool used by Debian/Ubuntu systems to generate initrd.

The dracut module does not care about this option at all.

All hope is not lost, however. Since dracut is just systemd in initrd and sources files from the normal system, you can just add something like ExecStartPre=/bin/sleep 30 to /etc/systemd/system/zfs-import-scan.service and rebuild initrd instead.

ghost commented 2 years ago

There are drawbacks, however.

Firstly, zfs-import-scan.service is a file provided by upstream and should not be edited in principle, it will also be overwritten on update.

Secondly, the wait time is fundamentally non-deterministic and undesirable. This also applies to Debian.

So ideally the module should be loaded precisely after HBA initialization and precisely before pool import. This did not happen and is outside my ability to fix.

mdiepart commented 2 years ago

I tried to add the ExecStartPre=/bin/sleep 30 to the file /lib/systemd/system/zfs-import-scan.service (it does not live in /etc/... ) however this did not solve the problem. My guess is that ExecStartPre is only executed if the different Conditions... are satisfied, which is not the case.

mdiepart commented 2 years ago

The latest screenshot says that initrd (RedHat's way of saying initramfs, initial RAM disk) did not even attempt to import the pools because the zfs module was not loaded.

There is an option in dracut to force load the kernel module: force_drivers+=" zfs ". This option "ensured that the drivers are tried to be loaded early via modprobe" (man page).

You can try this with echo 'force_drivers+=" zfs "' >>/etc/dracut.conf.d/zfs.conf then rebuild initrd with

for directory in /lib/modules/*; do
  kernel_version=$(basename $directory)
  dracut --force --kver $kernel_version
done

Then reboot.

This did not work either... I do believe however that your suggestion is right and that the HBA adapter is taking its time and that we need to insert a delay somewhere.

mskarbek commented 2 years ago

Instead of directly editing /lib/systemd/system/zfs-import-scan.service add new file /etc/systemd/system/zfs-import-scan.service.d/override.conf:

[Service]
ExecStartPre=/bin/sleep 30

Systemd will combine both and that way there will be no conflicts during upgrades, change will be persistent.

ghost commented 2 years ago

Yep, you are right. And 30 seconds is also much longer as necessary: 10 secs would be sufficient, I think.

Before continue tinkering with services, I think we should give forced drivers a try.

ghost commented 2 years ago

override.conf ...

Edit: Excellent! So we only need to add the delay to one of the dependencies of zpool import instead.

ghost commented 2 years ago

Hmm, seems that the override.conf needs to be explicitly declared in dracut config, or else it will not be included in initrd. 'zfs-import-service' itself is declared in this file.

mdiepart commented 2 years ago

For some reasons, since I have added the force_drivers line to the zfs dracut conf file, zfs-import-scan does not work anymore. I do have to manually enter zpool import -a in order to import the pools and continue with the boot sequence.

ghost commented 2 years ago

So that didn't work out, sorry... force_drivers should be removed then.

On the other hand you can install the override file with install_items+=" /etc/file "

mdiepart commented 2 years ago

Adding the sleep delay did not have any effect. When I tried it previously it had no effect either even though the delay seemed to be taken into account (at least the call to systemctl start zfs-import-scan.service had the delay and produced the same results.

ghost commented 2 years ago

We can maybe instead write a standalone service that has the following properties:

#draft unit file, verify options
Exec=/bin/sleep 5
Before=zfs-import-scan.service
WantedBy=zfs-import.target
ghost commented 2 years ago

Also other options from systemctl cat zfs-import-scan may be useful, such as

After=systemd-udev-settle.service
Requires=systemd-udev-settle.service

[Service]
Type=oneshot
...

I'm no expert in writing unit files, so please take this advice with a ton of salt.

mdiepart commented 2 years ago

I am no expert either in unit files however I just learned about timers. I did not know what to put in the [Install] section but your comment above just answered that.

I created a file /lib/systemd/system/zfs-import-scan.timer with content

[Unit]
Description=Adds 10 seconds of delay before importing the zfs pools so that the HBA330 controller has time to initializes and import disks

[Timer]
OnActiveSec=10
Unit=zfs-import-scan.service

[Install]
WantedBy=zfs-import.target

.

I then did

systemctl disable zfs-import-scan.service
systemctl enable zfs-import-scan.timer

.

This sadly had no effect...

EDIT: I also tried by adding install_files+=" /etc/systemd/system/zfs-import.target/zfs-import-scan.timer " to /etc/dracut-conf.d/zfs.conf but it did not help.

ghost commented 2 years ago

Systemd timers are, I think, more like cron jobs and therefore not suited for this purpose. (RedHat has pushed to deprecated cron and promoto systemd for years).

A timer is by default associated with the synonymous service, for example:

  # define the frequency of DDNS
  systemd.timers.v6ddns = {
    enable = true;
    timerConfig = {
      # re-run the service after 240s
      OnUnitActiveSec = "240s";
    };
    wantedBy = [ "timers.target" ];
  };

  # define the service itself
  systemd.services.v6ddns = {
    enable = true;
    after = [ "network-online.target" ];
    description = "Update IPv6 DDNS";
    wantedBy = [ "multi-user.target" ];
    path = [ pkgs.iproute2 pkgs.jq pkgs.gnugrep pkgs.curl pkgs.coreutils pkgs.lxd ];
    serviceConfig = {
      ExecStart = ''/state/bin/ddns.sh'';
      User = "root";
      Type = "oneshot";
      PrivateTmp = "true";
      ProtectSystem = "full";
      WorkingDirectory = "/tmp";
    };
  };
mdiepart commented 2 years ago

Never underestimate a good night of sleep, I finally got it to work!

However, for some reason (most likely related to the HBA, see here), the zfs module was finally loaded, with a delay, when you are typing in the shell. With the module loaded, the directory is now populated and the import service now runs as intended.

This is exact. The driver used is mpt3sas but the problem is the same.

Before continue tinkering with services, I think we should give forced drivers a try.

This is also true.

I thus added the line forced_drivers+=" mpt3sas " to the file /etc/dracut.conf.d/zfs.conf. The system booted on the first try. However, I do not know if it does only work on my machine or if this is the fix we were looking for.

ghost commented 2 years ago

Great. The final step we need is to confirm that if forced_drivers+=" mpt3sas " is the only modification we needed, not any of the intermediate attempts with various options.

If you followed the guide close enough, there is a step of creating a pre-boot snapshot. We can revert your system to the pre-relabel state, and apply the modification.

#let bieaz manage the initial snapshot
eval $(bieaz info)
zfs set org.bieaz:be.src=default ${BIEAZ_ROOT_POOL}${BIEAZ_ROOT_CONTAINER}default@install

# create a new boot environment from the initial snapshot
bieaz create default@install vanilla0
bieaz label vanilla0 "test if mpt3sas is fixed"

#set vanilla0 as default boot option
bieaz set-default vanilla0

#don't reboot yet, apply `forced_drivers+=" mpt3sas "` fix
chroot $(bieaz mount vanilla0)

#now we are inside the initial system
echo 'forced_drivers+=" mpt3sas "' >>/etc/dracut.conf.d/zfs.conf

#rebuild initrd
see above

#exit chroot
exit

#reboot
reboot
ghost commented 2 years ago

Edit, we should unmount the BE before reboot:

bieaz umount vanilla0
mdiepart commented 2 years ago

I did a clean re-install following the guide. I did not install the bieaz modules (I have no idea what it does and was said to be optional). It booted correctly on the first try.

The only problem I have is not related to this issue and is because when I reboot and I do not remove my USB key from the system, it cannot boot because for grub (hd0,gpt2) is the usb key and not the system drives. I am looking into ways to fix that.

ghost commented 2 years ago

bieaz is a boot environment manager, written by me. It's just a single bash script managing system datasets.

I understand your concerns w.r.t. 3rd party packages, I would be pretty suspicious too if I were you.

But in this case it is strongly recommended, not because I wrote it, but because there's no guarantee that your Root on ZFS system would survive the very next ZFS/kernel update, and bieaz protects you from a totally borked system, you always have the option to just rollback the update.

mdiepart commented 2 years ago

Ok thank you. Maybe you could add some details about that in the manual? It would also be a shame if someone installed it but does not use it because he doesn't know the role of the package.

And as a (last?) sidenote, I believe there is a small mistake in the section System Configuration part 8 (chroot).

The line before the last line (DISK=$DISK" > /mnt/root/chroot) should be changed to DISK=\"$DISK\"" > /mnt/root/chroot otherwise when you will source back the file and you have several disks in your installation, only the first one will be read-back in the chroot, the last one being interpreted as a command.

ghost commented 2 years ago

grub (hd0,gpt2)

There are two options. One is to remove /etc/grub.d/09_fix_root_on_zfs file and regenerate grub menu. (Again I recommend you to install bieaz so that you can use bieaz menu -g instead of grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg then mirror EFI content. bieaz has built-in support for mirrored EFI partitions.) This has its own set of drawbacks including, if unlucky, the possibility of rendering your system unbootable.

The second one is to just unplug the USB drive before reboot.

ghost commented 2 years ago

someone installed it but does not use it because he doesn't know the role of the package

There's no concern of that because I hooked bieaz with DNF package manager: when you have it installed, it would took snapshots automatically whenever you use DNF like install, remove, or update packages.

You can still install it with:

dnf copr enable -y m0p/bieaz
dnf install -y bieaz python3-dnf-plugin-rozb3
ghost commented 2 years ago

last one being interpreted as a command.

Thanks for the tip. Will open a pull request once we sorted out the current issue.

mdiepart commented 2 years ago

someone installed it but does not use it because he doesn't know the role of the package

There's no concern of that because I hooked bieaz with DNF package manager: when you have it installed, it would took snapshots automatically whenever you use DNF like install, remove, or update packages.

You can still install it with:

dnf copr enable -y m0p/bieaz
dnf install -y bieaz python3-dnf-plugin-rozb3

What I mean is that if it were me I wouldn't have thought about restoring the system using bieaz.

ghost commented 2 years ago

We are a special case because the installation is broken from start, so that assistance from me was needed. (maybe I should add a note somewhere)

Normally I would expect the system to be broken by package updates, in that case one can easily recover from GRUB menu.

mdiepart commented 2 years ago

Yeah okay I see what you mean.

I tried several things to fix the boot issue but nothing worked. The UUID grub uses appears nowhere in my system except in grub (it is not listed by blkid), and when I change the set root=hd0,gpt2 for set root= search -u --no-floppy <grub-uuid> manually for the UUID grub uses, it detects hd1,gpt1 instead of the expected hd1,gpt2. I tried by searching by label too (bpool_$INST_UUID) but it had the same problem. Maybe the search needs to be done later? I will let go of that as I do not have time to work on that now.

Thank you very much for your help. I do not think there is any issue to discuss here any more.

ghost commented 2 years ago

GRUB is a complex piece of software -- it's essentially a miniature (stage1 = 1MB) operating system in disguise., with support for almost every architecture (i386, x86_64, mips, ppc, arm32/64).

I tinkered with it a bit when I was developing GRUB menu integration for bieaz.

About the set root bit: if you have the time and patience (I spent about 3 months on writing bieaz and the guides), you will need to read the source code for grub, specifically grub-mkconfig and 10_linux.

A tip for bieaz: multi-disk support (mirror EFI content) is not enabled by default. You need to enable it inside /etc/bieaz.cfg.