Open liyi-ibm opened 6 years ago
Another thing to note:
On ubuntu, need to modify /etc/default/kdump-tools
and change nr_cpus=1 to maxcpus=1 (otherwise, there will null pointer exception)
istrib nousb" KDUMP_CMDLINE_APPEND="maxcpus=1 systemd.unit=kdump-tools.service irqpoll noirqdii strib nousb"
As stated by Mahesh:
Looks like it is crashing while accessing paca->core_idle_state_ptr which I guess isn't setup while booting kdump kernel. core_idle_state_ptr can be NULL if non-SMT value is passed as nr_cpus. Best way to fix that is always align-up the nr_cpus to SMT value on powerpc.
Can you verify if kdump.conf has nr_cpus=1 configured ? If yes, remove that and retry.
Here is the output:
"./pdbg_meng -p0 -c1 -t0 sreset"
There is NO checkstop this time. But there is kernel panic when rebooting.
"
[ 690.468074019,5] OPAL: Switch to little-endian OS
[ 1.331897] Unable to handle kernel paging request for data at address 0x00000000
[ 1.331967] Faulting instruction address: 0xc000000008034924
[ 1.332028] Oops: Kernel access of bad area, sig: 7 [#1]
[ 1.332107] LE SMP NR_CPUS=2048 NUMA PowerNV
[ 1.332189] Modules linked in:
[ 1.332253] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.17.0-fix #1
[ 1.332338] NIP: c000000008034924 LR: c0000000080a5b60 CTR: c0000000080348e0
[ 1.332432] REGS: c0000000097737a0 TRAP: 0300 Not tainted (4.17.0-fix)
[ 1.332514] MSR: 9000000000001033 <SF,HV,ME,IR,DR,RI,LE> CR: 24002284 XER: 00000000
[ 1.332611] CFAR: c000000008034914 DAR: 0000000000000000 DSISR: 00080000 SOFTE: 1
[ 1.332611] GPR00: 0000000000000000 c000000009773a20 c000000009776400 0000000000000004
[ 1.332611] GPR04: 0000000000000004 c0000000095e56d8 0000000000000000 0000000000000000
[ 1.332611] GPR08: 0000000000000000 0000000000000000 c000000009770000 c000000027a43600
[ 1.332611] GPR12: c000000008b05390 c000000009b00c00 0000000000000000 0000000009b10600
[ 1.332611] GPR16: 0000000027fb0000 0000000000000004 0000000000000000 0000000000000000
[ 1.332611] GPR20: 0000000000000000 0000000000000001 0000000010004d9c 00000000100053ed
[ 1.332611] GPR24: 0000000000000005 0000000000000005 0000000000000000 0000000000000005
[ 1.332611] GPR28: c0000000096a0fb0 00000000003003ff 0000000000300374 0000000000300374
[ 1.333487] NIP [c000000008034924] lwarx_loop_stop+0x0/0x24
[ 1.333568] LR [c0000000080a5b60] __power9_idle_type+0x80/0xb0
[ 1.333654] Call Trace:
[ 1.333702] [c000000009773a20] [c0000000096a0fb0] powernv_idle_driver+0x0/0x3e8 (unreliable)
[ 1.333814] [c000000009773d10] [c0000000080a5b60] __power9_idle_type+0x80/0xb0
[ 1.333908] [c000000009773d60] [c0000000080a6170] power9_idle_type+0x20/0x40
[ 1.334001] [c000000009773d80] [c000000008b053d0] stop_loop+0x40/0x5c
[ 1.334050] [c000000009773db0] [c000000008b01794] cpuidle_enter_state+0xa4/0x400
[ 1.334128] [c000000009773e10] [c000000008158a3c] call_cpuidle+0x4c/0x90
[ 1.334217] [c000000009773e30] [c00000000815906c] do_idle+0x32c/0x3d0
[ 1.334297] [c000000009773ea0] [c000000008159348] cpu_startup_entry+0x38/0x50
[ 1.334393] [c000000009773ed0] [c00000000800e030] rest_init+0xe0/0x100
[ 1.334501] [c000000009773f00] [c000000009114330] start_kernel+0x614/0x634
[ 1.334599] [c000000009773f90] [c00000000800ac7c] start_here_common+0x1c/0x520
[ 1.334712] Instruction dump:
[ 1.334767] f86d09b8 39800000 480003d8 60000000 60000000 e8a28080 e8850000 7c232000
[ 1.334877] 40800008 4c0002e4 88ed09a9 e9cd09a0 <7de07028> 75e91000 40c2fe2d 7def3878
[ 1.334981] ---[ end trace 0d5c6984e5006361 ]---
[ 2.737208]
[ 3.737233] Kernel panic - not syncing: Attempted to kill the idle task!
[ 5.139558] Rebooting in [ 730.088668117,5] OPAL: Reboot request...
When trigger kdump from OpenBMC using 'pdbg -p0 -c1 -t0 sreset', there is exception:
According to Nick: