Open draeath opened 1 year ago
Note, 4.18.0-372.26.1.el8_6
kernel boots fine with this module version.
I would guess this is another kABI issue, but don't have a RHEL8 system readily available to check. Try making DKMS delete and rebuild if you're using DKMS; and if not, try using DKMS? (I know you said you told it force, but I've seen DKMS decide the modules there were fine and it didn't need to rebuild before even when I asked it to, so please try removing them outright, being sure they're gone from every kernel, then installing again.)
I am using DKMS, and I did that already :P
I cleaned up the modules manually after uninstall, just didn't include that detail - sorry.
@draeath you need to update your kernel. You're running:
4.18.0-425.13.1
The kmods are built for 4.18.0-425.19.2
If these are built with DKMS, and it's indeed loading the DKMS versions, then it shouldn't matter how old it is as long as they're built against the right version, no?
(I suppose the question also becomes, do the prebuilt kmods break the same way if you swap to those?)
Indeed, I'm avoiding the prebuilt kABI-tracking modules intentionally, trying to avoid this very sort of problem.
(If I did use the prebuilt, would the module loader even find them?)
I'm hesitant to try as this came up on our log aggregation host - something I would like to avoid downtime on.
(complicating factor: we use a satellite instance that tends to be a month or two behind the upstream cdn and I do not have access to poke it for fresh kernels)
The kmods are built for 4.18.0-425.19.2
@draeath sorry, I totally glossed over that you were using the dkms modules, not kmods. I tried installing the 2.1.11 DKMS modules on EL8 (Almalinux 8) with the 4.18.0-425.19.2.el8_7.x86_64 kernel, and I was able to import a pool without issue.
Nothing up my sleeve...
[root@rhel8 ~]# dkms status
zfs/2.1.11, 4.18.0-425.13.1.el8_7.x86_64, x86_64: installed
[root@rhel8 ~]# uname -a
Linux rhel8 4.18.0-425.13.1.el8_7.x86_64 #1 SMP Thu Feb 2 13:01:45 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
[root@rhel8 ~]# zpool status
pool: mypool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
/root/bees1 ONLINE 0 0 0
errors: No known data errors
[root@rhel8 ~]# zpool version
zfs-2.1.11-1
zfs-kmod-2.1.11-1
Just using the dkms packages from the repo.
Could you possibly share, in full, the output of rpm -qa | egrep '(kernel|zfs|zpool)' | sort
and dkms status
?
(For reference, on my system newly-installed and then manually rebooted and installed the older kernel, I get:
abrt-addon-kerneloops-2.10.9-21.el8.x86_64
kernel-4.18.0-425.13.1.el8_7.x86_64
kernel-4.18.0-425.19.2.el8_7.x86_64
kernel-core-4.18.0-425.13.1.el8_7.x86_64
kernel-core-4.18.0-425.19.2.el8_7.x86_64
kernel-devel-4.18.0-425.13.1.el8_7.x86_64
kernel-devel-4.18.0-425.19.2.el8_7.x86_64
kernel-headers-4.18.0-425.19.2.el8_7.x86_64
kernel-modules-4.18.0-425.13.1.el8_7.x86_64
kernel-modules-4.18.0-425.19.2.el8_7.x86_64
kernel-tools-4.18.0-425.19.2.el8_7.x86_64
kernel-tools-libs-4.18.0-425.19.2.el8_7.x86_64
libzfs5-2.1.11-1.el8.x86_64
libzpool5-2.1.11-1.el8.x86_64
zfs-2.1.11-1.el8.x86_64
zfs-dkms-2.1.11-1.el8.noarch
zfs-release-2-2.el8.noarch
)
e: that's interesting, I explicitly told it to install kernel, -devel, and -headers of 13.1, and it only did the first two. Time to try rebuilding with the older headers in place...)
e2: nope, doesn't change, it doesn't panic for me.
# dkms status
zfs/2.1.11, 4.18.0-372.26.1.el8_6.x86_64, x86_64: installed
zfs/2.1.11, 4.18.0-372.9.1.el8.x86_64, x86_64: installed
zfs/2.1.11, 4.18.0-425.13.1.el8_7.x86_64, x86_64: installed
# rpm -qa | egrep '(kernel|zfs|zpool)' | sort
kernel-4.18.0-372.26.1.el8_6.x86_64
kernel-4.18.0-372.9.1.el8.x86_64
kernel-4.18.0-425.13.1.el8_7.x86_64
kernel-core-4.18.0-372.26.1.el8_6.x86_64
kernel-core-4.18.0-372.9.1.el8.x86_64
kernel-core-4.18.0-425.13.1.el8_7.x86_64
kernel-devel-4.18.0-372.26.1.el8_6.x86_64
kernel-devel-4.18.0-372.9.1.el8.x86_64
kernel-devel-4.18.0-425.13.1.el8_7.x86_64
kernel-headers-4.18.0-425.13.1.el8_7.x86_64
kernel-modules-4.18.0-372.26.1.el8_6.x86_64
kernel-modules-4.18.0-372.9.1.el8.x86_64
kernel-modules-4.18.0-425.13.1.el8_7.x86_64
kernel-tools-4.18.0-425.13.1.el8_7.x86_64
kernel-tools-libs-4.18.0-425.13.1.el8_7.x86_64
libzfs5-2.1.11-1.el8.x86_64
libzpool5-2.1.11-1.el8.x86_64
zfs-2.1.11-1.el8.x86_64
zfs-dkms-2.1.11-1.el8.noarch
zfs-dracut-2.1.11-1.el8.noarch
zfs-release-el-2-1.noarch
Here's that information. Also, just to make sure, this is the only enabled ZFS repo:
[zfs]
name=ZFS on Linux for EL$releasever - dkms
baseurl=http://download.zfsonlinux.org/epel/$releasever/$basearch/
enabled=1
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
The next one, I suppose, becomes "if you do find /lib/modules/ -iname zfs.ko.*
, what does it say?"
In particular, I'm wondering if it's "helpfully" linking the module versions from the older kernel in the newer one's directory such that they're earlier in the module search path.
Oh, weak-updates? I could see that.
I'll have a look. I could also outright disable them for these module in the DKMS configuration, I have done that for nvidia modukes in the past
The next one, I suppose, becomes "if you do
find /lib/modules/ -iname zfs.ko.*
, what does it say?"In particular, I'm wondering if it's "helpfully" linking the module versions from the older kernel in the newer one's directory such that they're earlier in the module search path.
# for i in $(find /lib/modules/ -iname zfs.ko.*); do stat "$i" | grep 'File: '; done
File: /lib/modules/4.18.0-372.9.1.el8.x86_64/extra/zfs.ko.xz
File: /lib/modules/4.18.0-372.26.1.el8_6.x86_64/extra/zfs.ko.xz
File: /lib/modules/4.18.0-425.13.1.el8_7.x86_64/weak-updates/zfs.ko.xz -> /lib/modules/4.18.0-372.26.1.el8_6.x86_64/extra/zfs.ko.xz
File: /lib/modules/4.18.0-425.13.1.el8_7.x86_64/extra/zfs.ko.xz
That's odd. I must have missed something while cleaning up - you shouldn't have the module in both places, right?
(I've run into problems with weak-updates before. anyone know why the dkms configuration you ship in the RPMs doesn't disable them for the zfs modules? googling around shows me someone did so on their fork, or at least tried?)
EDIT:
# grep -i weak /usr/src/zfs-2.1.11/dkms.conf
NO_WEAK_MODULES="yes"
Now I'm thoroughly confused how those got there. I removed them manually, uninstalled the module for that kernel via dkms, reinstalled... and it looks right now - real files under extra
instead of symlinks under weak-updates
I'll try to see if that resolves the kernel panic.
Bingo, that was the problem. I don't understand how those weak-update symlinks were present given the zfs-dkms configuration turns them off (and doing a dkms uninstall, removing them, and then doing a dkms autoinstall does not recreate them).
Maybe we need a different way of setting NO_WEAK_MODULES
? Clearly it didn't listen.
While I think it's unrelated, it's not impossible something else is up with this host that caused this behavior: we found today that there was a busted weak-updates symlink for a veeam module as well on this same host.
System information
Describe the problem you're observing
Kernel panic on module load (during boot) or when doing
zfs import
if system was booted without ZFS installed.Describe how to reproduce the problem
I already had ZFS via DKMS before this occurred, after updating packages I noticed that 'dkms status' had some odd output (did not save it). To troubleshoot I booted to
emergency.target
, disabled the dependent services, and completely uninstalled ZFS. Boot was successful, I reinstalledzfs-dkms
,zfs-dracut
, andzfs
packages. I did adkms autoinstall -m zfs -v 2.1.11 -k <varies>
for each installed kernel. Finally, I diddracut --regenerate-all --force
. I did NOT yet reboot, but ranzfs import
to ensure the module worked. At this point, a panic occurs. Subsequent boots see the same panic, as expected.Include any warning/errors/backtraces from the system logs
Output is from RS232 console, line "Modules linked in:" is truncated