zbm-dev / zfsbootmenu

ZFS Bootloader for root-on-ZFS systems with support for snapshots and native full disk encryption
https://zfsbootmenu.org
MIT License
810 stars 63 forks source link

No keyboard input after building with dracut and systemd 256.1 (at least on Arch Linux) #648

Open samy-mahmoudi opened 2 weeks ago

samy-mahmoudi commented 2 weeks ago

ZFSBootMenu build source

Local build, dracut

ZFSBootMenu version

2.3.0

Boot environment distribution

Arch Linux

Problem description

No keyboard input when being prompted for my passphrase.

With my key included (install_items+=" PATH_TO_MY_KEY ") for debugging purposes and zbm.show: no keyboard input while on the menu.

Possible, related discussion: https://github.com/zbm-dev/zfsbootmenu/discussions/619

Steps to reproduce

  1. Rollback to a ZFS snapshot where the version of systemd, systemd-libs, systemd-sysvcompat and lib32-systemd is 255.7-1.
  2. Upgrade all the packages but the four aforementioned packages.
  3. Generate EFI bundle or components.
  4. Reboot.
  5. Observe working keyboard inputs.
  6. Upgrade the four aforementioned packages (to date, to version 256.1-1) and only those packages.
  7. Generate EFI bundle or components.
  8. Reboot.
  9. Observe broken keyboard inputs.

By default of a relevant ZFS snapshot, I believe the four packages can be downgraded to 255.7-1 at the OS level.

Note

Building with mkinitcpio and systemd 256.1-1 produces working EFI bundles and components.

zdykstra commented 2 weeks ago

This is not likely a systemd issue, since that Dracut module is explicitly blacklisted in ZFSBootMenu. When you roll back to a previous snapshot, does your kernel also change? If so, what is the kernel in the working snapshot and what is the kernel in the non-working snapshot?

samy-mahmoudi commented 2 weeks ago

Here are the snapshots I have around the breaking point:

Date: 2024-06-15 Working state: OK Kernel (LTS): 6.6.32-1 Systemd: 255.7-1 ZFS: 2.2.4-1 Dracut: 101

Date: 2024-06-16 Working state: OK Kernel (LTS): 6.6.33-1 Systemd: 255.7-1 ZFS: 2.2.4-1 Dracut: 102-1

Date: 2024-06-20 Working state: Broken (No keyboard input) Kernel (LTS): 6.6.34-1 Systemd: 256.1-1 ZFS: 2.2.4-1 Dracut: 102-1

When I determined the working states by creating clones out of the three snapshots, then booting the environments and eventually building ZFSBootmenu, I realized that between the last working state and the first broken state, both the kernel and systemd had their version changed. So I decided to upgrade the kernel with its headers only at first to exclude a kernel issue, which eventually led to me to the following states:

Date: 2024-06-29 Working state: OK Kernel (LTS): 6.6.36-1 Systemd: 255.7-1 ZFS: 2.2.4-1 Dracut: 102-1

Date: 2024-06-29 Working state: Broken (No keyboard input) Kernel (LTS): 6.6.36-1 Systemd: 256.1-1 ZFS: 2.2.4-1 Dracut: 102-1

classabbyamp commented 2 weeks ago

can you diff the contents of the EFIs? you can extract with lsinitrd

also, what's the hardware like? I suspect a udev change

samy-mahmoudi commented 4 days ago

Hi again all,

can you diff the contents of the EFIs? you can extract with lsinitrd

I certainly can.

I used a couple of sed commands to filter out the noise induced by the differences in date and time:

[0]samy@x270:/home/samy$ lsinitrd /efi/EFI/arch/zfsbootmenu/initramfs-bootmenu.img-Jun16-kern6.6.38 \
  | sed 's/^.*root.*root.*..:.. //' \
  >Jun16-kern6.6.38
[0]samy@x270:/home/samy$ lsinitrd /efi/EFI/arch/zfsbootmenu/initramfs-bootmenu.img-Jun20-kern6.6.38 \
  | sed 's/^.*root.*root.*..:.. //' \
  >Jun20-kern6.6.38
[0]samy@x270:/home/samy$ diff -u Jun16-kern6.6.38 Jun20-kern6.6.38
--- Jun16-kern6.6.38        2024-07-11 01:02:12.737687529 -0400
+++ Jun20-kern6.6.38        2024-07-11 01:01:51.254354706 -0400
@@ -1,4 +1,4 @@
-Image: /efi/EFI/arch/zfsbootmenu/initramfs-bootmenu.img-Jun16-kern6.6.38: 81M
+Image: /efi/EFI/arch/zfsbootmenu/initramfs-bootmenu.img-Jun20-kern6.6.38: 81M
 ========================================================================
 Early CPIO image
 ========================================================================
@@ -504,18 +504,12 @@
 usr/lib/libkeyutils.so -> libkeyutils.so.1
 usr/lib/libkeyutils.so.1 -> libkeyutils.so.1.10
 -rwxr-xr-x   1 root     root        22400 Apr 27  2023 usr/lib/libkeyutils.so.1.10
-usr/lib/libkmod.so -> libkmod.so.2.4.2
-usr/lib/libkmod.so.2 -> libkmod.so.2.4.2
-usr/lib/libkmod.so.2.4.2
 usr/lib/libkrb5.so -> libkrb5.so.3.3
 usr/lib/libkrb5.so.3 -> libkrb5.so.3.3
 -rwxr-xr-x   1 root     root       882664 Dec 24  2023 usr/lib/libkrb5.so.3.3
 usr/lib/libkrb5support.so -> libkrb5support.so.0.1
 usr/lib/libkrb5support.so.0 -> libkrb5support.so.0.1
 -rwxr-xr-x   1 root     root        55472 Dec 24  2023 usr/lib/libkrb5support.so.0.1
-usr/lib/liblz4.so -> liblz4.so.1
-usr/lib/liblz4.so.1 -> liblz4.so.1.9.4
-usr/lib/liblz4.so.1.9.4
 usr/lib/liblzma.so -> liblzma.so.5.6.2
 usr/lib/liblzma.so.5 -> liblzma.so.5.6.2
 usr/lib/liblzma.so.5.6.2
@@ -577,14 +571,14 @@
 usr/lib/libsmartcols.so.1 -> libsmartcols.so.1.1.0
 usr/lib/libsmartcols.so.1.1.0
 usr/lib/libsystemd.so -> libsystemd.so.0
-usr/lib/libsystemd.so.0 -> libsystemd.so.0.38.0
-usr/lib/libsystemd.so.0.38.0
+usr/lib/libsystemd.so.0 -> libsystemd.so.0.39.0
+usr/lib/libsystemd.so.0.39.0
 usr/lib/libtirpc.so -> libtirpc.so.3.0.0
 usr/lib/libtirpc.so.3 -> libtirpc.so.3.0.0
 -rwxr-xr-x   1 root     root       186800 Oct  7  2023 usr/lib/libtirpc.so.3.0.0
 usr/lib/libudev.so -> libudev.so.1
-usr/lib/libudev.so.1 -> libudev.so.1.7.8
-usr/lib/libudev.so.1.7.8
+usr/lib/libudev.so.1 -> libudev.so.1.7.9
+usr/lib/libudev.so.1.7.9
 usr/lib/libuuid.so -> libuuid.so.1
 usr/lib/libuuid.so.1 -> libuuid.so.1.3.0
 usr/lib/libuuid.so.1.3.0
@@ -2287,7 +2281,7 @@
 usr/lib/ossl-modules/legacy.so
 usr/lib/profiling-lib.sh
 usr/lib/systemd
-usr/lib/systemd/libsystemd-shared-255.7-1.so
+usr/lib/systemd/libsystemd-shared-256.1-1.so
 usr/lib/systemd/network
 usr/lib/systemd/network/80-6rd-tunnel.network
 usr/lib/systemd/network/80-container-host0.network

also, what's the hardware like? I suspect a udev change

The machine is a Lenovo ThinkPad x270 used with its integrated keyboard, which does not have any LED on the CapsLock key (see below).

I have tried to reproduce the issue with an external USB keyboard plugged. Pretty much the same result (no input), with one notable difference: CapsLocks keystrokes do not toggle the CapsLock LED that is present on the external USB keyboard.

samy-mahmoudi commented 4 days ago
Some testing suggests this bug *doesn't* prevent boot,
at least of a simple default Fedora install in a VM;
patching ProtectSystem to default to off for initramfses
doesn't get the system booting again, but patching dracut
to pull in the kmod library does get it booting again.

Surely the inability to write hooks must have some fairly
significant consequences in some cases, though.

Originally posted by @AdamWill in https://github.com/systemd/systemd/issues/32511#issuecomment-2080144756

samy-mahmoudi commented 4 days ago

Note:

Although [dracut-ng/dracut-ng@a45048b] has landed on Arch, I have nonetheless tried to install_items+=" /etc/systemd/system.conf " with the default:

#ProtectSystem=auto

changed to:

ProtectSystem=no

I could then see /etc/systemd/system.conf in the relevant output of lsinitrd, without any more success when booting the ZFSBootMenu RAM disk.

samy-mahmoudi commented 4 days ago

A static sequence of lines:

install_items+=" /usr/lib/libkmod.so "
install_items+=" /usr/lib/libkmod.so.2 "
install_items+=" /usr/lib/libkmod.so.2.4.2 "

in /etc/zfsbootmenu/dracut.conf.d/workaround_lack_of_libkmod.conf works around the issue.

I will now try to figure out how and where (ZFSBootMenu, dracut, systemd) to fix the issue fully. According to the following, annotated references, my guess is that a full fix rather lies between ZFSBootMenu and dracut-ng (in the large sense).

References:

classabbyamp commented 4 days ago

this is probably dracut assuming there will be a full set of systemd stuff in the initramfs. zfsbootmenu blocklists everything systemd it can because it's incompatible with how ZBM works. thus, when systemd-udev is added, it brings in libsystemd (unavoidable and does not cause compatibility issues), it's missing the libkmod that whatever adds udev expected to be added by the thing that added the rest of systemd.

LaszloGombos commented 1 day ago

There is some related work upstream https://github.com/dracut-ng/dracut-ng/pull/507 and some related Gentoo discussion - https://bugs.gentoo.org/935548