rhkdump / kdump-utils

Kernel crash dump collection utilities
GNU General Public License v2.0
4 stars 9 forks source link

kdump should auto-add required modules if they are missing from host initrd #11

Open jbtrystram opened 4 weeks ago

jbtrystram commented 4 weeks ago

In Fedora CoreOS, we exclude nfs from the initramfs as booting from NFS is not supported : https://github.com/coreos/fedora-coreos-config/blob/testing-devel/overlay.d/05core/usr/lib/dracut/dracut.conf.d/coreos-omits.conf#L7

However, dumping with kdump to a target destination should work obviously. Currently, this requires the users to add extra_modules nfs to /etc/kdump.conf.

Could the module be added automatically when the target is a NFS destination ?

ref: https://github.com/coreos/fedora-coreos-tracker/issues/1729

daveyoung commented 3 weeks ago

Hi, Coiby filed an RH internal jira issue to dracut, dracut team suggest to use --force-add to override the config file dracut --force-add nfs Can you try see if this works I think we can do it in kdump code. Please refer to https://issues.redhat.com/browse/RHEL-26114

jbtrystram commented 1 week ago

Hello @daveyoung, sorry for the delayed response

I did the following testing today, i was not able to set-up a NFS server for testing so I can only attest what modules are loaded according to config options.

on FCOS, with the default config the NFS module is not loaded in kdump initramfs (as expected):

kdump:/# modinfo nfs
modinfo: ERROR: Module nfs not found.

With the following kdump.conf config :

          path /var/crash
          extra-modules nfs

I can see the module loaded:

kdump:/# modinfo nfs
filename:       /lib/modules/6.8.11-300.fc40.x86_64/kernel/fs/nfs/nfs.ko.xz
license:        GPL
author:         Olaf Kirch <okir@monad.swb.de>
alias:          nfs4
alias:          fs-nfs4
alias:          fs-nfs
rhelversion:    9.99
depends:        sunrpc,netfs,lockd
retpoline:      Y
intree:         Y
name:           nfs
vermagic:       6.8.11-300.fc40.x86_64 SMP preempt mod_unload 
sig_id:         PKCS#7
signer:         Fedora kernel signing key
sig_key:        12:8F:6B:9F:1E:CC:CB:0D:BA:F0:10:12:70:1C:A6:DD:A1:09:9A:11
sig_hashalgo:   sha256
signature:      CC:3........

However, using dracut-args with --force-add like this does not works:

          path /var/crash
         dracut-args --force-add nfs

During the kdump initramfs build dracut fails :

Starting kdump.service - Crash recovery kernel arming...
kdump: No kdump initial ramdisk found.
kdump: Rebuilding /var/lib/kdump/initramfs-6.8.11-300.fc40.x86_64kdump.img
Executing: /usr/bin/dracut --add kdumpbase --quiet --hostonly --hostonly-cmdline --hostonly-i18n --hostonly-mode strict --hostonly-nics  --aggressive-strip -o "plymouth resume ifcfg earlykdump" --force-add nfs --mount "/dev/disk/by-uuid/bc826339-762e-4f7b-9514-5a6011cc0318 /sysroot xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota" --squash-compressor zstd --no-hostonly-default-device -f /var/lib/kdump/initramfs-6.8.11-300.fc40.x86_64kdump.img 6.8.11-300.fc40.x86_64
Module 'systemd-networkd' will not be installed, because it's in the list to be omitted!
Module 'systemd-pcrphase' will not be installed, because command '/usr/lib/systemd/systemd-pcrphase' could not be found!
Module 'busybox' will not be installed, because command 'busybox' could not be found!
Module 'dbus-daemon' will not be installed, because it's in the list to be omitted!
Module 'rngd' will not be installed, because command 'rngd' could not be found!
Module 'connman' will not be installed, because command 'connmand' could not be found!
Module 'connman' will not be installed, because command 'connmanctl' could not be found!
Module 'connman' will not be installed, because command 'connmand-wait-online' could not be found!
Module 'connman' will not be installed, because command 'connmand' could not be found!
Module 'connman' will not be installed, because command 'connmanctl' could not be found!
Module 'connman' will not be installed, because command 'connmand-wait-online' could not be found!
Module 'ifcfg' will not be installed, because it's in the list to be omitted!
Module 'plymouth' will not be installed, because it's in the list to be omitted!
62bluetooth: Could not find any command of '/usr/lib/bluetooth/bluetoothd /usr/libexec/bluetooth/bluetoothd'!
Module 'dmraid' will not be installed, because it's in the list to be omitted!
Module 'lvm' will not be installed, because it's in the list to be omitted!
Module 'pcsc' will not be installed, because command 'pcscd' could not be found!
Module 'fcoe' will not be installed, because it's in the list to be omitted!
Module 'fcoe-uefi' will not be installed, because it's in the list to be omitted!
Module 'nbd' will not be installed, because it's in the list to be omitted!
Module 'nfs' will not be installed, because it's in the list to be omitted!
Module 'resume' will not be installed, because it's in the list to be omitted!
Module 'biosdevname' will not be installed, because it's in the list to be omitted!
Module 'earlykdump' will not be installed, because it's in the list to be omitted!
Module 'memstrack' will not be installed, because it's in the list to be omitted!
dracut[E]: Module 'nfs' cannot be installed.
Module 'nfs' cannot be installed.
kdump: mkdumprd: failed to make kdump initrd
kdump: Starting kdump: [FAILED]
kdump.service: Main process exited, code=exited, status=1/FAILURE
kdump.service: Failed with result 'exit-code'.
Failed to start kdump.service - Crash recovery kernel arming.

So detecting the nfs destination from the config file and adding extra-module accordingly should work to workaround any dracuts omits the distribution sets up. I am not sure why force-add does not work though

travier commented 1 week ago

Issue for --force-arg not working as expected: https://issues.redhat.com/browse/RHEL-26114

pvalena commented 1 week ago

Please add --debug to the dracut-args, so I can see what's happening in that case (e.g. what's in your config).

jbtrystram commented 6 days ago

kdump.service.log

@pvalena See attached log file. It's the output of journalctl -o cat -u kdump.service

With kdump.conf as following :

path /var/crash
core_collector nop
failure_action shell
dracut_args --force-add nfs --debug

/etc/sysconfig/kdump was not changed