linuxppc / issues

Issues repository for linuxppc
5 stars 0 forks source link

Get k(g)db working #140

Open mpe opened 6 years ago

mpe commented 6 years ago

The very basics work, eg:

# echo hvc0 >  /sys/module/kgdboc/parameters/kgdboc 
# echo g > /proc/sysrq-trigger
Entering kdb (current=0x00000000651b7ab4, pid 0) on processor 13 due to Keyboard Entry
[13]kdb> btp 1
Stack traceback for pid 1
0x000000003966c3d4        1        0  0    5   S  0x000000008ee0e51a  systemd
Call Trace:
[c0000000fea03900] [c0000000fea03960] 0xc0000000fea03960 (unreliable)
[c0000000fea03ad0] [c00000000001eb1c] __switch_to+0x34c/0x4b0
[c0000000fea03b30] [c000000000b387f0] __schedule+0x380/0xbe0
[c0000000fea03c00] [c000000000b390a4] schedule+0x54/0xd0
[c0000000fea03c30] [c000000000b40444] schedule_hrtimeout_range_clock+0x184/0x190
[c0000000fea03cc0] [c000000000419c74] ep_poll+0x344/0x430
[c0000000fea03d80] [c000000000419e64] do_epoll_wait+0x104/0x120
[c0000000fea03dd0] [c00000000041b1a4] sys_epoll_pwait+0x1b4/0x1c0
[c0000000fea03e30] [c00000000000b860] system_call+0x58/0x6c

But then other things oops, in particular the self tests blow up.

[13]kdb> btc
btc: cpu status: Currently on cpu 13
Available cpus: 0-12(I), 13, 14-15(I)
Unable to handle kernel paging request for data at address 0x98d3bc0c
Faulting instruction address: 0xc000000000151a58
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c kvm binfmt_misc vmx_crypto ip_tables x_tables autofs4 crc32c_vpmsum virtio_net
CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.17.0-rc1-gcc-6.3.1-00001-gb56aa49046fe #1403
NIP:  c000000000151a58 LR: c000000000227284 CTR: 0000000000000008
REGS: c0000001fffb6ea0 TRAP: 0300   Not tainted  (4.17.0-rc1-gcc-6.3.1-00001-gb56aa49046fe)
MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE>  CR: 24004428  XER: 20000000
CFAR: c000000000008830 DAR: 0000000098d3bc0c DSISR: 40000000 SOFTE: 1 
GPR00: c00000000022a994 c0000001fffb7120 c0000000010f6400 0000000098d3bc04 
GPR04: 0000000000000010 c0000001fffb70e0 c000000001b1a6ee 0000000098d3bc04 
GPR08: ffffffffffffffbf c000000001b16400 ffffffffffffffc9 000000098d3bc040 
GPR12: c00000000022a780 c00000003ffdf300 c000000001b1a970 c000000000fcc360 
GPR16: c000000000d62140 c000000000d62178 c000000001b027c0 c000000000b8ccc0 
GPR20: c000000000d62120 0000000000000001 c000000001b19cc0 0000000000000000 
GPR24: 0000000000000000 0000000000000032 c000000001b1a6d8 0000000000000000 
GPR28: 0000000000000001 c000000001b1a7b0 c000000001b1a6d8 0000000098d3bc04 
NIP [c000000000151a58] task_curr+0x8/0x40
LR [c000000000227284] kdb_set_current_task+0x34/0xc0
Call Trace:
[c0000001fffb7120] [c000000001b1a6ee] cbuf.35602+0x16/0xcc (unreliable)
[c0000001fffb7150] [c00000000022a994] kdb_bt+0x214/0x500
[c0000001fffb7240] [c000000000226d04] kdb_parse+0x4f4/0x8b0
[c0000001fffb7310] [c00000000022aa78] kdb_bt+0x2f8/0x500
[c0000001fffb7400] [c000000000226d04] kdb_parse+0x4f4/0x8b0
[c0000001fffb74d0] [c000000000227930] kdb_main_loop+0x470/0xa20
[c0000001fffb75d0] [c00000000022bc9c] kdb_stub+0x30c/0x5e0
[c0000001fffb7650] [c00000000021df78] kgdb_cpu_enter+0x378/0x790
[c0000001fffb7750] [c00000000021e760] kgdb_handle_exception+0x190/0x2b0
[c0000001fffb7820] [c00000000004a7c4] kgdb_handle_breakpoint+0x64/0xa0
[c0000001fffb7850] [c00000000002b214] program_check_exception+0x264/0x370
[c0000001fffb78c0] [c000000000009020] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x3c/0x70
    LR = __handle_sysrq+0x12c/0x2c0
[c0000001fffb7bb0] [c000000000d29e40] flag_spec.61673+0x133f74/0x1e9e0c (unreliable)
[c0000001fffb7bd0] [c0000000006de5cc] __handle_sysrq+0x12c/0x2c0
[c0000001fffb7c70] [c0000000006f6850] hvc_poll+0x1c0/0x360
[c0000001fffb7d00] [c0000000006f7b3c] hvc_handle_interrupt+0x2c/0x60
[c0000001fffb7d30] [c00000000019ff90] __handle_irq_event_percpu+0x110/0x3c0
[c0000001fffb7e20] [c0000000001a027c] handle_irq_event_percpu+0x3c/0x90
[c0000001fffb7e60] [c0000000001a0330] handle_irq_event+0x60/0xb0
[c0000001fffb7ea0] [c0000000001a5dd8] handle_fasteoi_irq+0xc8/0x240
[c0000001fffb7ee0] [c00000000019e514] generic_handle_irq+0x54/0x80
[c0000001fffb7f10] [c0000000000190fc] __do_irq+0xbc/0x2d0
[c0000001fffb7f90] [c00000000002eca0] call_do_irq+0x14/0x24
[c0000001fe7979b0] [c0000000000193b0] do_IRQ+0xa0/0x130
[c0000001fe797a10] [c000000000008d30] hardware_interrupt_common+0x150/0x160
--- interrupt: 501 at plpar_hcall_norets+0x1c/0x28
    LR = check_and_cede_processor+0x34/0x50
[c0000001fe797d00] [c000000000952bd0] check_and_cede_processor+0x20/0x50 (unreliable)
[c0000001fe797d60] [c000000000952da0] shared_cede_loop+0x50/0x140
[c0000001fe797d90] [c00000000094fc78] cpuidle_enter_state+0xa8/0x440
[c0000001fe797df0] [c00000000015ada0] call_cpuidle+0x70/0xd0
[c0000001fe797e30] [c00000000015b4e8] do_idle+0x328/0x3a0
[c0000001fe797ec0] [c00000000015b7a8] cpu_startup_entry+0x38/0x50
[c0000001fe797ef0] [c00000000004d6bc] start_secondary+0x4ec/0x530
[c0000001fe797f90] [c00000000000b170] start_secondary_prolog+0x10/0x14
Instruction dump:
2f890000 409eff10 3c62ffc6 39200001 3863edc8 992aee5b 4bfbacd9 60000000 
0fe00000 4bfffef0 3c4c00fa 384249b0 <e9430008> 3d020004 3908d6f0 3d22ffe7 
---[ end trace d4d77b5b70c0a456 ]---

Kernel panic - not syncing: Fatal exception in interrupt
rnav commented 6 years ago

Is that used by anyone on powerpc? I thought xmon is our preferred debugger. Are there scenarios where k(g)db would be useful to have?

mpe commented 6 years ago

It's not used much because we've always had xmon. Though xmon is more of a "crash handler" than a debugger. kgdb can (in theory) do full gdb-style debugging against a running kernel, which could be useful at times.

But really we should either get it working or prevent it from being enabled, I don't like having things that are known to not work sitting around for people to trip up on.

chleroy commented 6 years ago

Tried it on 8xx, no Oops, but uggly 'ptrval'. Should we do something about it ?

root@vgoip:~# tty
/dev/ttyCPM0
root@vgoip:~# echo ttyCPM0 > /sys/module/kgdboc/parameters/kgdboc
root@vgoip:~#
root@vgoip:~# echo g > /proc/sysrq-trigger
[  240.108192] sysrq: SysRq : DEBUG

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btp 1
Stack traceback for pid 1
0x(ptrval)        1        0  0    0   S  0x(ptrval)  init
Call Trace:
[c60e1db0] [100c82b6] 0x100c82b6 (unreliable)
[c60e1e70] [c05352ac] __schedule+0x22c/0x5ac
[c60e1eb0] [c053565c] schedule+0x30/0x5c
[c60e1ec0] [c001fbfc] do_wait+0x1a8/0x29c
[c60e1ef0] [c0020b18] kernel_wait4+0x80/0x128
[c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0
kdb>

But it silently hangs after calling 'help'

kdb> help
Command         Usage                Description
----------------------------------------------------------
md              <vaddr>             Display Memory Contents, also mdWcN, e.g. md8c1
mdr             <vaddr> <bytes>     Display Raw Memory
mdp             <paddr> <bytes>     Display Physical Memory
mds             <vaddr>             Display Memory Symbolically
mm              <vaddr> <contents>  Modify Memory Contents
go              [<vaddr>]           Continue Execution
rd                                  Display Registers
rm              <reg> <contents>    Modify Registers
ef              <vaddr>             Display exception frame
bt              [<vaddr>]           Stack traceback
btp             <pid>               Display stack for process <pid>
bta             [D|R|S|T|C|Z|E|U|I|M|A]
                                    Backtrace all processes matching state flag
btc                                 Backtrace current process on each cpu
btt             <vaddr>             Backtrace process given its struct task address
env                                 Show environment variables
set                                 Set environment variables
help                                Display Help Message
chleroy commented 6 years ago

bta silently hangs as well

kdb> bta
15 sleeping system daemon (state M) processes suppressed,
use 'ps A' to see all.
Stack traceback for pid 282
0x(ptrval)      282      280  1    0   R  0x(ptrval) *sh
Call Trace:
[c656fb10] [c0089cd0] kdb_show_stack+0x80/0xa4 (unreliable)
[c656fb30] [c0089d90] kdb_bt1.isra.0+0x9c/0xf4
[c656fb60] [c0089e60] kdb_bt+0x78/0x348
[c656fbf0] [c00875f8] kdb_parse+0x430/0x730
[c656fc40] [c0087d5c] kdb_main_loop+0x348/0x8f4
[c656fca0] [c008acdc] kdb_stub+0x18c/0x3c0
[c656fcd0] [c0080c24] kgdb_handle_exception+0x2c8/0x720
[c656fd60] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98
[c656fd70] [c000af38] program_check_exception+0x104/0x700
[c656fd90] [c000e45c] ret_from_except_full+0x0/0x4
[c656fe50] [c0266494] __handle_sysrq+0x120/0x1a0
[c656fe80] [c026698c] write_sysrq_trigger+0x44/0x5c
[c656fe90] [c016bd64] proc_reg_write+0x60/0xf0
[c656fea0] [c011b130] __vfs_write+0x28/0x178
[c656fef0] [c011b460] vfs_write+0xb8/0x1cc
[c656ff10] [c011b6ec] ksys_write+0x4c/0xc4
[c656ff40] [c000e11c] ret_from_syscall+0x0/0x38
chleroy commented 6 years ago

Those silent hangs are in fact a problem in CPM serial driver. Following patch is proposed to fix it: https://patchwork.ozlabs.org/patch/969723/

chleroy commented 6 years ago

When booting with parameter 'debug_boot_weak_hash', I get the following

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

Seems like kdb_getarea tries to access hashed address and not real address (The 0xba99ad80 in first line is written by %p while kdb_getarea() uses %lx

chleroy commented 6 years ago

The issue seems to be linked to the following call:

sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));

On the 8xx, we end up with btt (ptrval) (it takes a huge amount of time before getting enough entropy to print hashed values).

On faster platforms, we most likely end up with an hashed pointer, which is by definition a non valid address hence the Oops.

chleroy commented 6 years ago

When replacing that %p by %px, it works:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry kdb> btc btc: cpu status: Currently on cpu 0 Available cpus: 0 Stack traceback for pid 282 0x(ptrval) 282 280 1 0 R 0x(ptrval) *sh Call Trace: [c627ba30] [c0089cd0] kdb_show_stack+0x80/0xa4 (unreliable) [c627ba50] [c0089d90] kdb_bt1.isra.0+0x9c/0xf4 [c627ba80] [c0089f64] kdb_bt+0x17c/0x348 [c627bb10] [c00875f8] kdb_parse+0x430/0x730 [c627bb60] [c0089ff8] kdb_bt+0x210/0x348 [c627bbf0] [c00875f8] kdb_parse+0x430/0x730 [c627bc40] [c0087d5c] kdb_main_loop+0x348/0x8f4 [c627bca0] [c008acdc] kdb_stub+0x18c/0x3c0 [c627bcd0] [c0080c24] kgdb_handle_exception+0x2c8/0x720 [c627bd60] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98 [c627bd70] [c000af38] program_check_exception+0x104/0x700 [c627bd90] [c000e45c] ret_from_except_full+0x0/0x4 [c627be50] [c0266494] __handle_sysrq+0x120/0x1a0 [c627be80] [c026698c] write_sysrq_trigger+0x44/0x5c [c627be90] [c016bd64] proc_reg_write+0x60/0xf0 [c627bea0] [c011b130] __vfs_write+0x28/0x178 [c627bef0] [c011b460] vfs_write+0xb8/0x1cc [c627bf10] [c011b6ec] ksys_write+0x4c/0xc4 [c627bf40] [c000e11c] ret_from_syscall+0x0/0x38

chleroy commented 6 years ago

See https://patchwork.ozlabs.org/patch/969879/

@mpe, does it fix the issue you reported ?

mpe commented 6 years ago

It helps, it makes btc work.

But the selftests still crash.

KGDB: Registered I/O driver kgdbts
kgdbts:RUN plant and detach test

Entering kdb (current=0x(____ptrval____), pid 1) on processor 5 due to Keyboard Entry
[5]kdb> kgdbts:RUN sw breakpoint test
kgdbts: BP mismatch c0000000001feb80 expected c000000000751300
KGDB: re-enter exception: ALL breakpoints killed
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802e50] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802e90] [c00000000020027c] kgdb_handle_exception+0x2bc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
    LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
    LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80
Kernel panic - not syncing: Recursive entry to debugger
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802db0] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802df0] [c00000000010df7c] panic+0x144/0x318
[c0000001fe802e90] [c00000000020028c] kgdb_handle_exception+0x2cc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
    LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
    LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80
BUG: sleeping function called from invalid context at ../arch/powerpc/kernel/rtas.c:515
in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper/0
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802bd0] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802c10] [c00000000014afec] ___might_sleep+0x13c/0x170
[c0000001fe802c70] [c00000000003a86c] rtas_busy_delay+0x3c/0xe0
[c0000001fe802ca0] [c00000000003c5a4] rtas_os_term+0xa4/0xf0
[c0000001fe802d20] [c0000000000c91e0] pseries_panic+0x30/0x50
[c0000001fe802d50] [c00000000002d910] ppc_panic_event+0x70/0x90
[c0000001fe802d70] [c000000000140a4c] notifier_call_chain+0x9c/0x110
[c0000001fe802dc0] [c000000000140ba8] __atomic_notifier_call_chain+0x38/0x60
[c0000001fe802df0] [c00000000010dfc0] panic+0x188/0x318
[c0000001fe802e90] [c00000000020028c] kgdb_handle_exception+0x2cc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
    LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
    LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80

SLOF **********************************************************************
chleroy commented 6 years ago

Apparently, it works on the 8xx:

[    1.590212] KGDB: Registered I/O driver kgdbts
[    1.594598] kgdbts:RUN plant and detach test

Entering kdb (current=(ptrval), pid 1) due to Keyboard Entry
kdb> [    1.606219] kgdbts:RUN sw breakpoint test
[    1.606219] kgdbts:RUN sw breakpoint test
[    1.614619] kgdbts:RUN bad memory access test
[    1.620122] kgdbts:RUN singlestep test 1000 iterations
[    1.633252] kgdbts:RUN singlestep [0/1000]
[    2.458827] kgdbts:RUN singlestep [100/1000]
[    3.284343] kgdbts:RUN singlestep [200/1000]
[    4.109926] kgdbts:RUN singlestep [300/1000]
[    4.935381] kgdbts:RUN singlestep [400/1000]
[    5.760988] kgdbts:RUN singlestep [500/1000]
[    6.586595] kgdbts:RUN singlestep [600/1000]
[    7.412013] kgdbts:RUN singlestep [700/1000]
[    8.237622] kgdbts:RUN singlestep [800/1000]
[    9.063095] kgdbts:RUN singlestep [900/1000]
[    9.880413] kgdbts:RUN do_fork for 100 breakpoints
[   18.626055] KGDB: Unregistered I/O driver kgdbts, debugger disabled
chleroy commented 6 years ago

Seems to work properly on 83xx as well:

[    0.559537] KGDB: Registered I/O driver kgdbts
[    0.564081] kgdbts:RUN plant and detach test

Entering kdb (current=(ptrval), pid 1) due to Keyboard Entry
kdb> [    0.575225] kgdbts:RUN sw breakpoint test
[    0.581083] kgdbts:RUN bad memory access test
[    0.585874] kgdbts:RUN singlestep test 1000 iterations
[    0.594401] kgdbts:RUN singlestep [0/1000]
[    0.926293] kgdbts:RUN singlestep [100/1000]
[    1.258422] kgdbts:RUN singlestep [200/1000]
[    1.590503] kgdbts:RUN singlestep [300/1000]
[    1.922563] kgdbts:RUN singlestep [400/1000]
[    2.254630] kgdbts:RUN singlestep [500/1000]
[    2.586718] kgdbts:RUN singlestep [600/1000]
[    2.918883] kgdbts:RUN singlestep [700/1000]
[    3.250982] kgdbts:RUN singlestep [800/1000]
[    3.583073] kgdbts:RUN singlestep [900/1000]
[    3.911853] kgdbts:RUN do_fork for 100 breakpoints
[   11.127248] KGDB: Unregistered I/O driver kgdbts, debugger disabled
chleroy commented 6 years ago

@mpe, it seems you get a program check from somewhere else than expected:

kgdbts: BP mismatch c0000000001feb80 expected c000000000751300

Then the second program check is the WARN_ON in eprintk() called from check_and_rewind_pc()

Could you tell what is at c000000000751300 and what is at c0000000001feb80 ?

chleroy commented 6 years ago

In the meantime, I discovered that kdb was left over when we implemented STRICT_KERNEL_RWX.

The following patch fixes setting the breakpoint with STRICT_KERNEL_RWX is active:

https://patchwork.ozlabs.org/patch/971040/

mpe commented 6 years ago

I've fixed the breakpoint mismatch. On LE we need to use ppc_function_entry() in lookup_addr().

Now it's getting to the singlestep test, which seems to be getting stuck.

farosas commented 6 years ago

Not sure if you want to track this here but I see that the hvc driver only sends KGDB output to hvc0, regardless of kgdboc value in the boot line:

[root@localhost ~]# cat /proc/cmdline
root=UUID=dcda20b4-8fbe-4f52-ba40-b1a98fa55139 console=hvc0 kgdboc=hvc1
[root@localhost ~]# tty
/dev/hvc0
[root@localhost ~]# echo g > /proc/sysrq-trigger 
[  102.855530] sysrq: SysRq : DEBUG
[  102.855571] KGDB: Entering KGDB
+$OK#9a                    <-- this should only work in hvc1

From drivers/tty/hvc/hvc_console.c:

static int hvc_poll_get_char(struct tty_driver *driver, int line)
{
    struct tty_struct *tty = driver->ttys[0];
    struct hvc_struct *hp = tty->driver_data;
    ...
}

static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch)
{
    struct tty_struct *tty = driver->ttys[0];
    struct hvc_struct *hp = tty->driver_data;
    ...
}

This is particularly relevant when using QEMU with -serial mon:stdio -serial tcp:0:1234,server,nowait since there's no way to "detach" from hvc0 to connect gdb.

leitao commented 6 years ago

I am testing this on 4.19 kernel, and this is what I get, not sure if it is related, or I am mis-using it:

[    8.762802] sysrq: SysRq : DEBUG
[    8.763406] KGDB: Entering KGDB
[    8.765296] Unable to handle kernel paging request for data at address 0x00000260
[    8.765370] Faulting instruction address: 0xc00000000062ac9c
[    8.765735] KGDB: re-enter exception: ALL breakpoints killed
[    8.766044] CPU: 0 PID: 49 Comm: sh Not tainted 4.19.0-04681-g01aa9d5 #3
[    8.766253] Call Trace:
[    8.766867] [c00000001e853070] [c0000000009160e4] dump_stack+0xe8/0x164 (unreliable)
[    8.767037] [c00000001e8530c0] [c0000000001fd544] kgdb_handle_exception+0x294/0x2c0
[    8.767142] [c00000001e853190] [c000000000048b70] kgdb_debugger+0xc0/0xe0
[    8.767226] [c00000001e8531b0] [c000000000029b44] die+0xc4/0xf0
[    8.767301] [c00000001e8531f0] [c000000000069ec8] bad_page_fault+0xe8/0x180
[    8.767385] [c00000001e853260] [c00000000000b160] handle_page_fault+0x34/0x38
[    8.767500] --- interrupt: 300 at hvc_poll_get_char+0x2c/0x90
[    8.767500]     LR = kgdboc_get_char+0x4c/0x70

Looking at the failing instruction, I see:

c00000000062ac98:       00 00 29 e9     ld      r9,0(r9)
c00000000062ac9c:       60 02 29 e9     ld      r9,608(r9)

Looking at the code, I see:

 873 {
 874         struct tty_struct *tty = driver->ttys[0];
 875         struct hvc_struct *hp = tty->driver_data;

So, it means that tty is null, and it is being de-referenced by driver_data

chleroy commented 5 years ago

Following fixes are now in 4.20: linuxppc/linux@be28c1e3ca29887e207f0cbcd294cefe5074bab6 linuxppc/linux@dded2e159208a9edc21dd5c5f583afa28d378d39 linuxppc/linux@568fb6f42ac6851320adaea25f8f1b94de14e40a linuxppc/linux@fb978ca207743badfe7efd9eebe68bcbb4969f79

chleroy commented 5 years ago

Are there still issues with kgdb ?

adelva1984 commented 4 years ago

I'm seeing the bug in https://github.com/linuxppc/issues/issues/140#issuecomment-433072259 still happening with 5.8.0-rc4, so I don't think kgdb works with the hvc driver currently.