rhkdump / kdump-utils

Kernel crash dump collection utilities
GNU General Public License v2.0
3 stars 12 forks source link

mkdumprd: Skip global config setting dracutmodules= #29

Open cgwalters opened 4 months ago

cgwalters commented 4 months ago

The Fedora/RHEL bootc base images have dracut drop-ins which set dracutmodules+=.

However that seems override explicit module inclusion on the command line, which is a dracut bug: https://github.com/rhkdump/kdump-utils/issues/11

Work around this by making our own copy of the global config, and omitting config drop-ins which trigger this behavior.

I think longer term, this project should probably own its own global dracut config, actually. But that's a much larger set of work.

cgwalters commented 4 months ago

OK, with this I get the nfs module in the generated kdump initramfs. However I haven't yet successfully done a kdump to NFS, still testing/debugging that...

cgwalters commented 4 months ago

Tested and this works for me; the problem was I didn't have permissions on my NFS server set up correctly for writes.

licliu commented 4 months ago

This works for my virtual machine, but Aman said it didn't work for his bare-metal test enviornment.

cgwalters commented 4 months ago

This works for my virtual machine, but Aman said it didn't work for his bare-metal test enviornment.

Discussion in https://issues.redhat.com/browse/RHEL-49590 narrowed that down to nfsv3 vs nfsv4; do you want to merge this and look at fixes for nfsv3 as a followup?

daveyoung commented 4 months ago

This works for my virtual machine, but Aman said it didn't work for his bare-metal test enviornment.

Discussion in https://issues.redhat.com/browse/RHEL-49590 narrowed that down to nfsv3 vs nfsv4; do you want to merge this and look at fixes for nfsv3 as a followup?

Currently I'm not sure about the how hard the dracut fixes needed, and also if this is urgent. Ideally I still perfer fix dracut, I put a comment in RHEL-49590, could you and @pvalena provide inputs?

cgwalters commented 3 months ago

I think it's somewhat urgent as kdump working in all these scenarios is part of the RHEL certification suite.

cgwalters commented 3 months ago

WDYT of my comment:

I think longer term, this project should probably own its own global dracut config, actually. But that's a much larger set of work.

licliu commented 3 months ago

@cgwalters I created #31 for adding kdump dracut config. But for nfsv3 issue, we still need some change in dracut.

coiby commented 3 months ago

I'm curious to ask why does boot base image use dracutmodules instead of add_dracutmodules?

       dracutmodules+=" <dracut modules> "
           Specify a space-separated list of dracut modules to call when building the initramfs. Modules are located in /usr/lib/dracut/modules.d. This option forces dracut to only
           include the specified dracut modules. In most cases the "add_dracutmodules" option is what you want to use.
cgwalters commented 3 months ago

I'm curious to ask why does boot base image use dracutmodules instead of add_dracutmodules?

This is a good question. Just so everyone's on the same page: by default, dracut does "module autodetection", asking each module it has (including a bunch of built-in ones) whether or not the relevant binaries are present. So you'll basically normally see things like this:

dracut[I]: Module 'busybox' will not be installed, because command 'busybox' could not be found!
dracut[I]: Module 'rngd' will not be installed, because command 'rngd' could not be found!

(Yes, we should patch dracut to make this look less like an error; maybe condense all of them at the end to a single line dracut[I]: Skipping modules: 'busybox' 'rngd' ...)

If we use add_dracutmodules, then dracut still does that auto-detection. We could suppress that with omit_dracutmodules but then it becomes messier because if someone wants to derive from our image and actually add that content, they'd need to undo that.


All this to say...well, I guess we can just switch to add_dracutmodules, I can try sending a patch to dracut to clean up the output.

cgwalters commented 3 months ago

Ahh actually https://github.com/dracut-ng/dracut-ng/commit/d73cc24e112c01aa701a96a7b8a58adce78409e7 landed in dracut-ng already.

cgwalters commented 3 months ago

https://gitlab.com/fedora/bootc/base-images/-/merge_requests/36

coiby commented 3 months ago

I'm curious to ask why does boot base image use dracutmodules instead of add_dracutmodules?

This is a good question. Just so everyone's on the same page: by default, dracut does "module autodetection", asking each module it has (including a bunch of built-in ones) whether or not the relevant binaries are present. So you'll basically normally see things like this:

dracut[I]: Module 'busybox' will not be installed, because command 'busybox' could not be found!
dracut[I]: Module 'rngd' will not be installed, because command 'rngd' could not be found!

(Yes, we should patch dracut to make this look less like an error; maybe condense all of them at the end to a single line dracut[I]: Skipping modules: 'busybox' 'rngd' ...)

If we use add_dracutmodules, then dracut still does that auto-detection. We could suppress that with omit_dracutmodules but then it becomes messier because if someone wants to derive from our image and actually add that content, they'd need to undo that.

All this to say...well, I guess we can just switch to add_dracutmodules, I can try sending a patch to dracut to clean up the output.

Thanks for the clarification! So you use dracutmodules mainly to avoid outputting the seeming errors from detecting missing binaries.

Since https://gitlab.com/fedora/bootc/base-images/-/merge_requests/36 has been merged, there is no need for this PR, am I correct?

pvalena commented 3 months ago

@coiby FYI if you really want to be sure the module is included, it's better to use force_add_dracutmodules which ignores the omit_dracutmodules list.

cgwalters commented 3 months ago

Since https://gitlab.com/fedora/bootc/base-images/-/merge_requests/36 has been merged, there is no need for this PR, am I correct?

Yes, but we need get the fix all the way back to 9.4.z for the base image, I will try to make sure that happens.

cgwalters commented 3 months ago

FTR I created https://issues.redhat.com/browse/RHEL-56076 to track this...it should happen automatically within ~a week.

daveyoung commented 2 months ago

I suppose with Lichen's pr #31, this one is not needed and the RHEL-56076 can be worked separately. Should this pr be closed? @cgwalters @coiby @licliu