Open stuartthebruce opened 1 year ago
Switching to zfs-dkms-2.1.6-1.el8
avoids this problem
Note, this prevents a system from booting (even to single user mode) if kmod-zfs
is installed whether or not there is a zpool.
We saw this too. A case of weak-modules
' method not detecting some incompatibility, which made me less than hopeful that recompiling would work, so it's good to know that the DKMS package does. I would think the boot hang would be happening when udev sees a disk with a ZFS label and therefore tries to load the module, so
Note, this prevents a system from booting (even to single user mode) if
kmod-zfs
is installed whether or not there is a zpool.
surprises me, unless you have it in modules-load.d
(or in your initrd's) or something like that, but I never tried it (by then I was behind on maintenance and whether one could boot without a resource the host is supposed to serve was sort of immaterial anyway).
kmods have been added to the repositories for the Alma/Rocky/RHEL 8.7 kernel. I know it's a bother but it'd be great if you could verify they work correctly on Rocky Linux. They're built on AlmaLinux 8.7 so there is a small chance there's some subtle kernel difference which caused this. cc @tonyhutter
That works on the same system that failed above. Thanks for the quick fix.
I know very little about Linux kernel modules, but is there a general solution to automate compatibility checks to catch this earlier and throw an error rather than creating an un-bootable system?
Please also consider updating the package name when rebuilding, e.g., increase the build number from -1 to -2, or follow what some other packages due to include the OS point release, e.g., sssd-2.7.3-4.el8_7.1.x86_64
. I think it would be useful for future bug reports for some dnf, rpm, or yum command that can unambiguously let you know which kmod-zfs-2.1.6-1.el8.x86_64
is installed.
In theory, Linux doesn't promise any binary compatibility between kernel versions.
RH decided they wanted to promise some, and thus the whole weak-modules
thing exists, but as you see, it's not remotely perfect.
I suppose there could be a fundamentally different kind of test bot that runs nightly or so and just tries loading the latest module packages against updated RHEL/Alma/what-have-you, since RH are the only ones where a premade binary is provided. @behlendorf does that sound like something the project would be interested in doing, or too niche a problem to solve?
Bumping the 8.7 rebuild from kmod-zfs-2.1.6-1.el8.x86_64
to kmod-zfs-2.1.6-2.el8.x86_64
would also have the advantage that users upgrading EL8.6 systems to EL8.7 could be blissfully unaware of this problem and receive an automatic update (assuming they properly update /etc/yum.repos.d/zfs.repo
). However, as it stands an update of an EL8.6 system that already has a working kmod-zfs-2.1.6-1.el8.x86_64
installed will not automatically reinstall the working 8.7 kmod and a reboot will hang.
RedHat does promise kernel binary compatibility within a minor release as long as the kmod uses only whitelisted symbols. Unfortunately, ZFS needs symbols beyond those on the whitelist, and at least in this case there was a minor version bump, so no guarantees.
@rincebrain my feeling is this is a little too much of a niche problem. When a new RHEL/Alma/Rocky release is made we build against that kernel and verify the build with a full test suite run. Only if it passes do we post packages with kmods. Using binary kmods built against kernels from other minor releases isn't recommended. They may work, but they won't have been tested.
would also have the advantage that users upgrading EL8.6 systems to EL8.7 could be blissfully unaware of this problem and receive an automatic update
That's a good point and something we should consider doing in the future.
Can we set the dist part of the RPMs to have something like el8_<min_version>
instead of el8
(eg: el8_6, el8_7)? This is a fairly common practice that we can see on redhat packages. This would allow us to upgrade from 8.6 to 8.7 by upgrading packages on different repositories. Right now, the packages on both 8.6 and 8.7 repositories have the same release, version etc.
@behlendorf, is it now save to upgrade rocky from 8.6 to 8.7? Because here, I don't see any new version and my update notification don't list any new zfs update.
@jb-alvarado I am successfully using the new 8.7 repository, but note that even if you have that repository configured, if you already installed kmod-zfs-2.1.6
from the 8.6 repository, then because the Release
number was not bumped, it will not show as an update, and you need to dnf reinstall
it.
Thank you @quartsize for the help!
It was now a bit complicate: I though I can run reinstall
after reboot, but this did not work. So I had to boot in older Kernel, remove zfs, boot in new Kernel run sed -i "s/8.6/8.7/g" /etc/yum.repos.d/zfs.repo
an install zfs again.
If you look in the https://zfsonlinux.org/epel/zfs-release-2-2$(rpm --eval "%{dist}").noarch.rpm
package given on the page you linked, you'll see that the repofile provided therein uses the releasever
variable -- on my Rocky 8 systems that's simply 8
, and so I get http://download.zfsonlinux.org/epel/8/x86_64/, whose repomd.xml
is the same as for 8.7
, so using that version of the release package and/or repofile might save you needing to edit it to have the most recent repository available for any dnf reinstall
s.
System information
Describe the problem you're observing
kmod-zfs-2.1.6-1.el8.x86_64 fails to load and generates a CPU soft lockup
Describe how to reproduce the problem
Boot without ZFS sinstalled
Include any warning/errors/backtraces from the system logs