rhkdump / kdump-utils

Kernel crash dump collection utilities
GNU General Public License v2.0
3 stars 12 forks source link

kdump fails to write dump on ppc64le since linux 6.9 #15

Closed jbtrystram closed 3 months ago

jbtrystram commented 3 months ago

we are seeing issues in the PPC64le pipeline, since the kernel updated :

[2024-06-17T21:03:44.954Z]   kernel 6.8.11-300.fc40 -> 6.9.4-200.fc40
[2024-06-17T21:03:44.954Z]   kernel-core 6.8.11-300.fc40 -> 6.9.4-200.fc40
[2024-06-17T21:03:44.954Z]   kernel-modules 6.8.11-300.fc40 -> 6.9.4-200.fc40
[2024-06-17T21:03:44.954Z]   kernel-modules-core 6.8.11-300.fc40 -> 6.9.4-200.fc40

Kexec-tools is on : kexec-tools-2.0.28-4.fc40.ppc64le

Kdump fails with :

[    4.959417] systemd[1]: Starting kdump-capture.service - Kdump Vmcore Save Service...
[    4.984011] kdump[483]: Kdump is using the default log level(3).
[    5.027827] kdump[518]: saving to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2024-06-23-21:27:26/
[    5.110978] kdump[523]: saving vmcore-dmesg.txt to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2024-06-23-21:27:26/
[    5.132538] kdump[529]: saving vmcore-dmesg.txt complete
[    5.135767] kdump[531]: saving vmcore
[    5.158826] kdump.sh[532]: 
Checking for memory holes                         : [  0.0 %] /                  
Checking for memory holes                         : [100.0 %] |                  readpage_elf: Attempt to read non-existent page at 0xc000000000000.
[    5.159480] kdump.sh[532]: readmem: type_addr: 0, addr:c00c000000000000, size:16384
[    5.159708] kdump.sh[532]: __exclude_unnecessary_pages: Can't read the buffer of struct page.
[    5.159938] kdump.sh[532]: create_2nd_bitmap: Can't exclude unnecessary pages.
[    5.162968] kdump.sh[532]: The kernel version is not supported.
[    5.163176] kdump.sh[532]: The makedumpfile operation may be incomplete.
[    5.163767] kdump.sh[532]: makedumpfile Failed.
[    5.165982] kdump[534]: saving vmcore failed, exitcode:1
[    5.168791] kdump[536]: saving vmcore failed
[    5.187037] kdump[541]: saving the /run/initramfs/kexec-dmesg.log to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2024-06-23-21:27:26///
[    5.190190] systemd[1]: kdump-capture.service: Main process exited, code=exited, status=1/FAILURE
[    5.190797] systemd[1]: kdump-capture.service: Failed with result 'exit-code'.
[    5.191005] systemd[1]: Failed to start kdump-capture.service - Kdump Vmcore Save Service.
coiby commented 3 months ago

I think this is the same to https://bugzilla.redhat.com/show_bug.cgi?id=2269991. And makedumpfile-1.7.5-11.fc40, kdump-utils-1.0.42-10.fc40 and kexec-tools-2.0.28-10.fc40 have been submitted for testing.

jbtrystram commented 3 months ago

Looks like it yeah, I'll test again and report :) with kexec-tools-2.0.28-10.fc40 we see kdump.service failing to start with no log messages on the console. Is the default config not shipped anymore ? I will turn on debug

https://jenkins-coreos-ci.apps.ocp.fedoraproject.org/blue/organizations/jenkins/test-override/detail/test-override/717/pipeline

jbtrystram commented 3 months ago

This is fixed with kexec-tools-2.0.28-10.fc40 https://github.com/coreos/fedora-coreos-config/pull/3053