Open mpe opened 6 years ago
Is that used by anyone on powerpc? I thought xmon
is our preferred debugger. Are there scenarios where k(g)db would be useful to have?
It's not used much because we've always had xmon
. Though xmon
is more of a "crash handler" than a debugger. kgdb can (in theory) do full gdb-style debugging against a running kernel, which could be useful at times.
But really we should either get it working or prevent it from being enabled, I don't like having things that are known to not work sitting around for people to trip up on.
Tried it on 8xx, no Oops, but uggly 'ptrval'. Should we do something about it ?
root@vgoip:~# tty
/dev/ttyCPM0
root@vgoip:~# echo ttyCPM0 > /sys/module/kgdboc/parameters/kgdboc
root@vgoip:~#
root@vgoip:~# echo g > /proc/sysrq-trigger
[ 240.108192] sysrq: SysRq : DEBUG
Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btp 1
Stack traceback for pid 1
0x(ptrval) 1 0 0 0 S 0x(ptrval) init
Call Trace:
[c60e1db0] [100c82b6] 0x100c82b6 (unreliable)
[c60e1e70] [c05352ac] __schedule+0x22c/0x5ac
[c60e1eb0] [c053565c] schedule+0x30/0x5c
[c60e1ec0] [c001fbfc] do_wait+0x1a8/0x29c
[c60e1ef0] [c0020b18] kernel_wait4+0x80/0x128
[c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0
kdb>
But it silently hangs after calling 'help'
kdb> help
Command Usage Description
----------------------------------------------------------
md <vaddr> Display Memory Contents, also mdWcN, e.g. md8c1
mdr <vaddr> <bytes> Display Raw Memory
mdp <paddr> <bytes> Display Physical Memory
mds <vaddr> Display Memory Symbolically
mm <vaddr> <contents> Modify Memory Contents
go [<vaddr>] Continue Execution
rd Display Registers
rm <reg> <contents> Modify Registers
ef <vaddr> Display exception frame
bt [<vaddr>] Stack traceback
btp <pid> Display stack for process <pid>
bta [D|R|S|T|C|Z|E|U|I|M|A]
Backtrace all processes matching state flag
btc Backtrace current process on each cpu
btt <vaddr> Backtrace process given its struct task address
env Show environment variables
set Set environment variables
help Display Help Message
bta silently hangs as well
kdb> bta
15 sleeping system daemon (state M) processes suppressed,
use 'ps A' to see all.
Stack traceback for pid 282
0x(ptrval) 282 280 1 0 R 0x(ptrval) *sh
Call Trace:
[c656fb10] [c0089cd0] kdb_show_stack+0x80/0xa4 (unreliable)
[c656fb30] [c0089d90] kdb_bt1.isra.0+0x9c/0xf4
[c656fb60] [c0089e60] kdb_bt+0x78/0x348
[c656fbf0] [c00875f8] kdb_parse+0x430/0x730
[c656fc40] [c0087d5c] kdb_main_loop+0x348/0x8f4
[c656fca0] [c008acdc] kdb_stub+0x18c/0x3c0
[c656fcd0] [c0080c24] kgdb_handle_exception+0x2c8/0x720
[c656fd60] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98
[c656fd70] [c000af38] program_check_exception+0x104/0x700
[c656fd90] [c000e45c] ret_from_except_full+0x0/0x4
[c656fe50] [c0266494] __handle_sysrq+0x120/0x1a0
[c656fe80] [c026698c] write_sysrq_trigger+0x44/0x5c
[c656fe90] [c016bd64] proc_reg_write+0x60/0xf0
[c656fea0] [c011b130] __vfs_write+0x28/0x178
[c656fef0] [c011b460] vfs_write+0xb8/0x1cc
[c656ff10] [c011b6ec] ksys_write+0x4c/0xc4
[c656ff40] [c000e11c] ret_from_syscall+0x0/0x38
Those silent hangs are in fact a problem in CPM serial driver. Following patch is proposed to fix it: https://patchwork.ozlabs.org/patch/969723/
When booting with parameter 'debug_boot_weak_hash', I get the following
Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80
Seems like kdb_getarea tries to access hashed address and not real address (The 0xba99ad80 in first line is written by %p while kdb_getarea() uses %lx
The issue seems to be linked to the following call:
sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
On the 8xx, we end up with btt (ptrval)
(it takes a huge amount of time before getting enough entropy to print hashed values).
On faster platforms, we most likely end up with an hashed pointer, which is by definition a non valid address hence the Oops.
When replacing that %p
by %px
, it works:
Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry kdb> btc btc: cpu status: Currently on cpu 0 Available cpus: 0 Stack traceback for pid 282 0x(ptrval) 282 280 1 0 R 0x(ptrval) *sh Call Trace: [c627ba30] [c0089cd0] kdb_show_stack+0x80/0xa4 (unreliable) [c627ba50] [c0089d90] kdb_bt1.isra.0+0x9c/0xf4 [c627ba80] [c0089f64] kdb_bt+0x17c/0x348 [c627bb10] [c00875f8] kdb_parse+0x430/0x730 [c627bb60] [c0089ff8] kdb_bt+0x210/0x348 [c627bbf0] [c00875f8] kdb_parse+0x430/0x730 [c627bc40] [c0087d5c] kdb_main_loop+0x348/0x8f4 [c627bca0] [c008acdc] kdb_stub+0x18c/0x3c0 [c627bcd0] [c0080c24] kgdb_handle_exception+0x2c8/0x720 [c627bd60] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98 [c627bd70] [c000af38] program_check_exception+0x104/0x700 [c627bd90] [c000e45c] ret_from_except_full+0x0/0x4 [c627be50] [c0266494] __handle_sysrq+0x120/0x1a0 [c627be80] [c026698c] write_sysrq_trigger+0x44/0x5c [c627be90] [c016bd64] proc_reg_write+0x60/0xf0 [c627bea0] [c011b130] __vfs_write+0x28/0x178 [c627bef0] [c011b460] vfs_write+0xb8/0x1cc [c627bf10] [c011b6ec] ksys_write+0x4c/0xc4 [c627bf40] [c000e11c] ret_from_syscall+0x0/0x38
See https://patchwork.ozlabs.org/patch/969879/
@mpe, does it fix the issue you reported ?
It helps, it makes btc
work.
But the selftests still crash.
KGDB: Registered I/O driver kgdbts
kgdbts:RUN plant and detach test
Entering kdb (current=0x(____ptrval____), pid 1) on processor 5 due to Keyboard Entry
[5]kdb> kgdbts:RUN sw breakpoint test
kgdbts: BP mismatch c0000000001feb80 expected c000000000751300
KGDB: re-enter exception: ALL breakpoints killed
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802e50] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802e90] [c00000000020027c] kgdb_handle_exception+0x2bc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80
Kernel panic - not syncing: Recursive entry to debugger
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802db0] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802df0] [c00000000010df7c] panic+0x144/0x318
[c0000001fe802e90] [c00000000020028c] kgdb_handle_exception+0x2cc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80
BUG: sleeping function called from invalid context at ../arch/powerpc/kernel/rtas.c:515
in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper/0
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802bd0] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802c10] [c00000000014afec] ___might_sleep+0x13c/0x170
[c0000001fe802c70] [c00000000003a86c] rtas_busy_delay+0x3c/0xe0
[c0000001fe802ca0] [c00000000003c5a4] rtas_os_term+0xa4/0xf0
[c0000001fe802d20] [c0000000000c91e0] pseries_panic+0x30/0x50
[c0000001fe802d50] [c00000000002d910] ppc_panic_event+0x70/0x90
[c0000001fe802d70] [c000000000140a4c] notifier_call_chain+0x9c/0x110
[c0000001fe802dc0] [c000000000140ba8] __atomic_notifier_call_chain+0x38/0x60
[c0000001fe802df0] [c00000000010dfc0] panic+0x188/0x318
[c0000001fe802e90] [c00000000020028c] kgdb_handle_exception+0x2cc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80
SLOF **********************************************************************
Apparently, it works on the 8xx:
[ 1.590212] KGDB: Registered I/O driver kgdbts
[ 1.594598] kgdbts:RUN plant and detach test
Entering kdb (current=(ptrval), pid 1) due to Keyboard Entry
kdb> [ 1.606219] kgdbts:RUN sw breakpoint test
[ 1.606219] kgdbts:RUN sw breakpoint test
[ 1.614619] kgdbts:RUN bad memory access test
[ 1.620122] kgdbts:RUN singlestep test 1000 iterations
[ 1.633252] kgdbts:RUN singlestep [0/1000]
[ 2.458827] kgdbts:RUN singlestep [100/1000]
[ 3.284343] kgdbts:RUN singlestep [200/1000]
[ 4.109926] kgdbts:RUN singlestep [300/1000]
[ 4.935381] kgdbts:RUN singlestep [400/1000]
[ 5.760988] kgdbts:RUN singlestep [500/1000]
[ 6.586595] kgdbts:RUN singlestep [600/1000]
[ 7.412013] kgdbts:RUN singlestep [700/1000]
[ 8.237622] kgdbts:RUN singlestep [800/1000]
[ 9.063095] kgdbts:RUN singlestep [900/1000]
[ 9.880413] kgdbts:RUN do_fork for 100 breakpoints
[ 18.626055] KGDB: Unregistered I/O driver kgdbts, debugger disabled
Seems to work properly on 83xx as well:
[ 0.559537] KGDB: Registered I/O driver kgdbts
[ 0.564081] kgdbts:RUN plant and detach test
Entering kdb (current=(ptrval), pid 1) due to Keyboard Entry
kdb> [ 0.575225] kgdbts:RUN sw breakpoint test
[ 0.581083] kgdbts:RUN bad memory access test
[ 0.585874] kgdbts:RUN singlestep test 1000 iterations
[ 0.594401] kgdbts:RUN singlestep [0/1000]
[ 0.926293] kgdbts:RUN singlestep [100/1000]
[ 1.258422] kgdbts:RUN singlestep [200/1000]
[ 1.590503] kgdbts:RUN singlestep [300/1000]
[ 1.922563] kgdbts:RUN singlestep [400/1000]
[ 2.254630] kgdbts:RUN singlestep [500/1000]
[ 2.586718] kgdbts:RUN singlestep [600/1000]
[ 2.918883] kgdbts:RUN singlestep [700/1000]
[ 3.250982] kgdbts:RUN singlestep [800/1000]
[ 3.583073] kgdbts:RUN singlestep [900/1000]
[ 3.911853] kgdbts:RUN do_fork for 100 breakpoints
[ 11.127248] KGDB: Unregistered I/O driver kgdbts, debugger disabled
@mpe, it seems you get a program check from somewhere else than expected:
kgdbts: BP mismatch c0000000001feb80 expected c000000000751300
Then the second program check is the WARN_ON in eprintk() called from check_and_rewind_pc()
Could you tell what is at c000000000751300
and what is at c0000000001feb80
?
In the meantime, I discovered that kdb was left over when we implemented STRICT_KERNEL_RWX.
The following patch fixes setting the breakpoint with STRICT_KERNEL_RWX is active:
I've fixed the breakpoint mismatch. On LE we need to use ppc_function_entry()
in lookup_addr()
.
Now it's getting to the singlestep test, which seems to be getting stuck.
Not sure if you want to track this here but I see that the hvc driver only sends KGDB output to hvc0, regardless of kgdboc value in the boot line:
[root@localhost ~]# cat /proc/cmdline
root=UUID=dcda20b4-8fbe-4f52-ba40-b1a98fa55139 console=hvc0 kgdboc=hvc1
[root@localhost ~]# tty
/dev/hvc0
[root@localhost ~]# echo g > /proc/sysrq-trigger
[ 102.855530] sysrq: SysRq : DEBUG
[ 102.855571] KGDB: Entering KGDB
+$OK#9a <-- this should only work in hvc1
From drivers/tty/hvc/hvc_console.c:
static int hvc_poll_get_char(struct tty_driver *driver, int line)
{
struct tty_struct *tty = driver->ttys[0];
struct hvc_struct *hp = tty->driver_data;
...
}
static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch)
{
struct tty_struct *tty = driver->ttys[0];
struct hvc_struct *hp = tty->driver_data;
...
}
This is particularly relevant when using QEMU with -serial mon:stdio -serial tcp:0:1234,server,nowait
since there's no way to "detach" from hvc0 to connect gdb.
I am testing this on 4.19 kernel, and this is what I get, not sure if it is related, or I am mis-using it:
[ 8.762802] sysrq: SysRq : DEBUG
[ 8.763406] KGDB: Entering KGDB
[ 8.765296] Unable to handle kernel paging request for data at address 0x00000260
[ 8.765370] Faulting instruction address: 0xc00000000062ac9c
[ 8.765735] KGDB: re-enter exception: ALL breakpoints killed
[ 8.766044] CPU: 0 PID: 49 Comm: sh Not tainted 4.19.0-04681-g01aa9d5 #3
[ 8.766253] Call Trace:
[ 8.766867] [c00000001e853070] [c0000000009160e4] dump_stack+0xe8/0x164 (unreliable)
[ 8.767037] [c00000001e8530c0] [c0000000001fd544] kgdb_handle_exception+0x294/0x2c0
[ 8.767142] [c00000001e853190] [c000000000048b70] kgdb_debugger+0xc0/0xe0
[ 8.767226] [c00000001e8531b0] [c000000000029b44] die+0xc4/0xf0
[ 8.767301] [c00000001e8531f0] [c000000000069ec8] bad_page_fault+0xe8/0x180
[ 8.767385] [c00000001e853260] [c00000000000b160] handle_page_fault+0x34/0x38
[ 8.767500] --- interrupt: 300 at hvc_poll_get_char+0x2c/0x90
[ 8.767500] LR = kgdboc_get_char+0x4c/0x70
Looking at the failing instruction, I see:
c00000000062ac98: 00 00 29 e9 ld r9,0(r9)
c00000000062ac9c: 60 02 29 e9 ld r9,608(r9)
Looking at the code, I see:
873 {
874 struct tty_struct *tty = driver->ttys[0];
875 struct hvc_struct *hp = tty->driver_data;
So, it means that tty is null, and it is being de-referenced by driver_data
Following fixes are now in 4.20: linuxppc/linux@be28c1e3ca29887e207f0cbcd294cefe5074bab6 linuxppc/linux@dded2e159208a9edc21dd5c5f583afa28d378d39 linuxppc/linux@568fb6f42ac6851320adaea25f8f1b94de14e40a linuxppc/linux@fb978ca207743badfe7efd9eebe68bcbb4969f79
Are there still issues with kgdb ?
I'm seeing the bug in https://github.com/linuxppc/issues/issues/140#issuecomment-433072259 still happening with 5.8.0-rc4, so I don't think kgdb works with the hvc driver currently.
The very basics work, eg:
But then other things oops, in particular the self tests blow up.