projg2 / eclean-kernel

Installed kernel cleanup tool
GNU General Public License v2.0
32 stars 11 forks source link

Unable to load any modules after running eclean-kernel #31

Open anyuta1166 opened 2 years ago

anyuta1166 commented 2 years ago

Hello. I've found rather critical bug that caused my server malfunction for several hours (time needed to recompile a kernel).

Steps to reproduce:

1) Compile a kernel with genkernel. 2) Recompile the same kernel version with genkernel again (for example, I decided to recompile the kernel with different config). 3) Run eclean-kernel -n 1

After the second compilation you have the following layout:

lily ~ # eclean-kernel -l
5.17.9-gentoo-x86_64 [5.17.9-gentoo-x86_64]
- systemmap: /boot/System.map-5.17.9-gentoo-x86_64
- initramfs: /boot/initramfs-5.17.9-gentoo-x86_64.img
- vmlinuz: /boot/vmlinuz-5.17.9-gentoo-x86_64
- modules: /lib/modules/5.17.9-gentoo-x86_64
- build: /usr/src/linux-5.17.9-gentoo
- last modified: 2022-09-12 09:37:10
5.17.9-gentoo-x86_64.old [5.17.9-gentoo-x86_64]
- systemmap: /boot/System.map-5.17.9-gentoo-x86_64.old
- initramfs: /boot/initramfs-5.17.9-gentoo-x86_64.img.old
- vmlinuz: /boot/vmlinuz-5.17.9-gentoo-x86_64.old
- modules: /lib/modules/5.17.9-gentoo-x86_64
- build: /usr/src/linux-5.17.9-gentoo
- last modified: 2022-09-12 09:37:10
lily ~ # ls -l /boot /lib/modules/*
/boot:
total 56888
drwxr-xr-x 6 root root     4096 Sep 12 15:50 grub
-rw-r--r-- 1 root root 11716484 Sep 12 15:12 initramfs-5.17.9-gentoo-x86_64.img
-rw-r--r-- 1 root root 11716484 Sep 12 15:12 initramfs-5.17.9-gentoo-x86_64.img.old
-rw-r--r-- 1 root root 10798080 May 22 22:15 intel-uc.img
-rw-r--r-- 1 root root  5093296 Sep 12 12:37 System.map-5.17.9-gentoo-x86_64
-rw-r--r-- 1 root root  5093296 Sep 12 12:37 System.map-5.17.9-gentoo-x86_64.old
-rw-r--r-- 1 root root  6906112 Sep 12 12:38 vmlinuz-5.17.9-gentoo-x86_64
-rw-r--r-- 1 root root  6906112 Sep 12 12:38 vmlinuz-5.17.9-gentoo-x86_64.old

/lib/modules/5.17.9-gentoo-x86_64:
total 4724
lrwxrwxrwx  1 root root      28 May 21 22:12 build -> /usr/src/linux-5.17.9-gentoo
drwxr-xr-x 14 root root    4096 Sep 12 15:12 kernel
-rw-r--r--  1 root root 1183583 Sep 12 15:12 modules.alias
-rw-r--r--  1 root root 1153484 Sep 12 15:12 modules.alias.bin
-rw-r--r--  1 root root    7051 Sep 12 15:11 modules.builtin
-rw-r--r--  1 root root   18808 Sep 12 15:12 modules.builtin.alias.bin
-rw-r--r--  1 root root    9415 Sep 12 15:12 modules.builtin.bin
-rw-r--r--  1 root root   57485 Sep 12 15:11 modules.builtin.modinfo
-rw-r--r--  1 root root  431094 Sep 12 15:12 modules.dep
-rw-r--r--  1 root root  587075 Sep 12 15:12 modules.dep.bin
-rw-r--r--  1 root root     453 Sep 12 15:12 modules.devname
-rw-r--r--  1 root root  138549 Sep 12 15:11 modules.order
-rw-r--r--  1 root root    1144 Sep 12 15:12 modules.softdep
-rw-r--r--  1 root root  551107 Sep 12 15:12 modules.symbols
-rw-r--r--  1 root root  665591 Sep 12 15:12 modules.symbols.bin
lrwxrwxrwx  1 root root      28 Sep 12 15:11 source -> /usr/src/linux-5.17.9-gentoo

Now run eclean-kernel:

lily ~ # eclean-kernel -p -n 1
Legend:
[-] file being removed
[+] file being kept (used by other kernels)

These are the kernels which would be removed:
- 5.17.9-gentoo-x86_64.old: not referenced by bootloader (grub2)
 [-] /boot/vmlinuz-5.17.9-gentoo-x86_64.old
 [+] /usr/src/linux-5.17.9-gentoo
 [+] /lib/modules/5.17.9-gentoo-x86_64
 [-] /boot/System.map-5.17.9-gentoo-x86_64.old
 [-] /boot/initramfs-5.17.9-gentoo-x86_64.img.old
kernel-install will be called to perform prerm tasks.
Bootloader grub2 config will be updated.
lily ~ # eclean-kernel -n 1
Legend:
[-] file being removed
[x] file does not exist (anymore)
[+] file being kept (used by other kernels)

* Removing kernel 5.17.9-gentoo-x86_64.old (not referenced by bootloader (grub2))
 [-] /boot/vmlinuz-5.17.9-gentoo-x86_64.old
 [+] /usr/src/linux-5.17.9-gentoo
 [+] /lib/modules/5.17.9-gentoo-x86_64
 [-] /boot/System.map-5.17.9-gentoo-x86_64.old
 [-] /boot/initramfs-5.17.9-gentoo-x86_64.img.old
Removed 1 kernels
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.17.9-gentoo-x86_64
Found initrd image: /boot/intel-uc.img /boot/initramfs-5.17.9-gentoo-x86_64.img
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
done

And you end up with this:

lily ~ # eclean-kernel -l
5.17.9-gentoo-x86_64 [5.17.9-gentoo-x86_64]
- systemmap: /boot/System.map-5.17.9-gentoo-x86_64
- initramfs: /boot/initramfs-5.17.9-gentoo-x86_64.img
- vmlinuz: /boot/vmlinuz-5.17.9-gentoo-x86_64
- modules: /lib/modules/5.17.9-gentoo-x86_64
- build: /usr/src/linux-5.17.9-gentoo
- last modified: 2022-09-12 09:37:10
lily ~ # ls -l /boot /lib/modules/*
/boot:
total 33720
drwxr-xr-x 6 root root     4096 Sep 12 15:59 grub
-rw-r--r-- 1 root root 11716484 Sep 12 15:12 initramfs-5.17.9-gentoo-x86_64.img
-rw-r--r-- 1 root root 10798080 May 22 22:15 intel-uc.img
-rw-r--r-- 1 root root  5093296 Sep 12 12:37 System.map-5.17.9-gentoo-x86_64
-rw-r--r-- 1 root root  6906112 Sep 12 12:38 vmlinuz-5.17.9-gentoo-x86_64

/lib/modules/5.17.9-gentoo-x86_64:
total 208
lrwxrwxrwx  1 root root     28 May 21 22:12 build -> /usr/src/linux-5.17.9-gentoo
drwxr-xr-x 14 root root   4096 Sep 12 15:12 kernel
-rw-r--r--  1 root root   7051 Sep 12 15:11 modules.builtin
-rw-r--r--  1 root root  57485 Sep 12 15:11 modules.builtin.modinfo
-rw-r--r--  1 root root 138549 Sep 12 15:11 modules.order
lrwxrwxrwx  1 root root     28 Sep 12 15:11 source -> /usr/src/linux-5.17.9-gentoo

For some reason, eclean-kernel removed most of the /lib/modules/5.17.9-gentoo-x86_64/modules.* files, leaving only modules.builtin, modules.builtin.modinfo and modules.order.

This makes the system unable to load any modules. I've accidentally found this after a reboot - hardware attached to my server wasn't working because the modules were not loaded.

mgorny commented 1 year ago

Hmm, I suspect kernel-install may be actually doing this. As you can see in the output, EK doesn't intend to remove moduledir.

Nowa-Ammerlaan commented 1 month ago

Probably we should not call kernel-install when the kernel is a genkernel-kernel. These kernels were never installed with /sbin/installkernel to begin with, so removing them via that mechanism is bound to get weird. The problem is detecting if a kernel is a genkernel-kernel.

Anyway, if this is indeed the problem. Then the workaround is to use the --no-kernel-install argument.

BratishkaErik commented 1 month ago

Just to help a little bit, I saw this issue happen with my "gentoo-kernel" dist-kernel too, but noticed it only after I changed whole layout from "plain kernel without initramfs" to "signed UKI with initramfs and plymouth and else inside", so unfortunately IDK which particular change caused this.

Nowa-Ammerlaan commented 1 month ago

Just to help a little bit, I saw this issue happen with my "gentoo-kernel" dist-kernel too, but noticed it only after I changed whole layout from "plain kernel without initramfs" to "signed UKI with initramfs and plymouth and else inside", so unfortunately IDK which particular change caused this.

That happens because when switching kernel layouts like that the detection of whether a module directory is still needed by another kernel does not take into account the kernels installed in the old layout.

BratishkaErik commented 1 month ago

Just to help a little bit, I saw this issue happen with my "gentoo-kernel" dist-kernel too, but noticed it only after I changed whole layout from "plain kernel without initramfs" to "signed UKI with initramfs and plymouth and else inside", so unfortunately IDK which particular change caused this.

That happens because when switching kernel layouts like that the detection of whether a module directory is still needed by another kernel does not take into account the kernels installed in the old layout.

In my case old layout is long gone, but I can still reproduce this error even at this moment:

$ doas eclean-kernel -n1          
Legend:
[-] file being removed
[x] file does not exist (anymore)
[+] file being kept (used by other kernels)

* Removing kernel other 6.10.11-gentoo-old (not referenced by bootloader (symlinks))
 [-] /boot/vmlinuz-6.10.11-gentoo-old.efi
 [+] /lib/modules/6.10.11-gentoo/../../../src/linux-6.10.11-gentoo
 [+] /lib/modules/6.10.11-gentoo
 [-] /boot/vmlinuz-6.10.11-gentoo-old.png
Removed 1 kernels

$ doas exa /lib/modules/6.10.11-gentoo 
build   kernel           modules.builtin.modinfo  modules.order  System.map  vmlinuz
config  modules.builtin  modules.builtin.objs     source         video

$ doas depmod

$ doas exa /lib/modules/6.10.11-gentoo
build          modules.alias.bin          modules.builtin.modinfo  modules.devname  modules.symbols.bin  video
config         modules.builtin            modules.builtin.objs     modules.order    modules.weakdep      vmlinuz
kernel         modules.builtin.alias.bin  modules.dep              modules.softdep  source               
modules.alias  modules.builtin.bin        modules.dep.bin          modules.symbols  System.map