openbmc / linux

OpenBMC Linux kernel source tree
Other
49 stars 132 forks source link

NULL pointer dereference in aspeed_i2c_master_irq #219

Open zevweiss opened 1 year ago

zevweiss commented 1 year ago

Running a dev-6.0 kernel on an ast2500 platform I'm working on a port to I'm sometimes hitting a panic on busses with an IPMB device on them:

[27559.144928] 8<--- cut here ---
[27559.148067] Unable to handle kernel NULL pointer dereference at virtual address 00000122
[27559.156186] [00000122] *pgd=00000000
[27559.159813] Internal error: Oops: 5 [#1] ARM
[27559.164120] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.19-67c9407-dirty-766959e #1
[27559.171811] Hardware name: Generic DT based system
[27559.176617] PC is at aspeed_i2c_master_irq+0x10c/0x4d0
[27559.181829] LR is at aspeed_i2c_bus_irq+0x58/0x140
[27559.186657] pc : [<804eea20>]    lr : [<804eee3c>]    psr: 80000193
[27559.192934] sp : 80b01e70  ip : 00000001  fp : 8095bdb8
[27559.198170] r10: 80b08588  r9 : 80b08a60  r8 : 80dcb3c0
[27559.203403] r7 : 00000001  r6 : 00000001  r5 : 00000001  r4 : 84a84420
[27559.209937] r3 : 00000000  r2 : 00000000  r1 : 9f893440  r0 : 00000122
[27559.216475] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[27559.223715] Control: 00c5387d  Table: 85490008  DAC: 00000051
[27559.229474] Register r0 information: non-paged memory
[27559.234563] Register r1 information: 0-page vmalloc region starting at 0x9f893000 allocated at __devm_ioremap_resource+0x150/0x1d8
[27559.246372] Register r2 information: NULL pointer
[27559.251118] Register r3 information: NULL pointer
[27559.255856] Register r4 information: slab kmalloc-1k start 84a84400 pointer offset 32 size 1024
[27559.264638] Register r5 information: non-paged memory
[27559.269725] Register r6 information: non-paged memory
[27559.274809] Register r7 information: non-paged memory
[27559.279890] Register r8 information: slab kmalloc-192 start 80dcb3c0 pointer offset 0 size 192
[27559.288585] Register r9 information: non-slab/vmalloc memory
[27559.294279] Register r10 information: non-slab/vmalloc memory
[27559.300056] Register r11 information: non-slab/vmalloc memory
[27559.305836] Register r12 information: non-paged memory
[27559.311002] Process swapper (pid: 0, stack limit = 0x88a4d53e)
[27559.316872] Stack: (0x80b01e70 to 0x80b02000)
[27559.321265] 1e60:                                     84a84420 80c02520 00000001 00000001
[27559.329464] 1e80: 80dcb3c0 804eee3c 80e5f440 80c02520 00000000 00000024 80dcb3c0 80b08a60
[27559.337663] 1ea0: 80b08588 801522bc 00000012 80c97240 80dcb3c0 80c02520 80c97b4c 80b67050
[27559.345861] 1ec0: 9dfff000 8015244c 80dcb3c0 80c02520 80c97b4c 80156320 0000000c 80151bbc
[27559.354057] 1ee0: 0000000c 803bf368 00001000 a8b1ee02 80b08588 80c024c0 80b8cf54 00000000
[27559.362249] 1f00: 80b01f5c 80151bbc 80c024c0 801011bc 80b01f28 8073c97c 80102a00 60000013
[27559.370447] 1f20: ffffffff 80100b04 00000000 00000000 8443cb49 00000000 00000000 80b08a60
[27559.378648] 1f40: 80b03064 80b08a60 9dfff000 00000000 80b08588 8095bdb8 84b9acc8 80b01f78
[27559.386847] 1f60: 801029fc 80102a00 60000013 ffffffff 00000051 80743398 00000000 80743398
[27559.395044] 1f80: 00000000 80144f40 80b0d748 80ba5040 9dfff018 80145254 80b0d748 8073ca20
[27559.403242] 1fa0: 00000001 80a00c58 00000001 80a0126c ffffffff ffffffff 00000000 80a00658
[27559.411444] 1fc0: 00000000 80a29a48 a8b7ee11 00000000 00000000 80a0033c 00000051 00c0387d
[27559.419642] 1fe0: 00000000 8feea000 410fb767 00c5387d 00000000 00000000 00000000 00000000
[27559.427843]  aspeed_i2c_master_irq from aspeed_i2c_bus_irq+0x58/0x140
[27559.434364]  aspeed_i2c_bus_irq from __handle_irq_event_percpu+0x50/0x168
[27559.441221]  __handle_irq_event_percpu from handle_irq_event+0x30/0x70
[27559.447792]  handle_irq_event from handle_simple_irq+0x7c/0xac
[27559.453679]  handle_simple_irq from generic_handle_domain_irq+0x2c/0x3c
[27559.460335]  generic_handle_domain_irq from aspeed_i2c_ic_irq_handler+0x64/0xf4
[27559.467685]  aspeed_i2c_ic_irq_handler from generic_handle_domain_irq+0x2c/0x3c
[27559.475035]  generic_handle_domain_irq from avic_handle_irq+0x44/0x60
[27559.481516]  avic_handle_irq from generic_handle_arch_irq+0x28/0x3c
[27559.487853]  generic_handle_arch_irq from __irq_svc+0x84/0xac
[27559.493645] Exception stack(0x80b01f28 to 0x80b01f70)
[27559.498722] 1f20:                   00000000 00000000 8443cb49 00000000 00000000 80b08a60
[27559.506924] 1f40: 80b03064 80b08a60 9dfff000 00000000 80b08588 8095bdb8 84b9acc8 80b01f78
[27559.515114] 1f60: 801029fc 80102a00 60000013 ffffffff
[27559.520187]  __irq_svc from arch_cpu_idle+0x30/0x38
[27559.525118]  arch_cpu_idle from default_idle_call+0x2c/0x74
[27559.530743]  default_idle_call from do_idle+0x70/0xd8
[27559.535848]  do_idle from cpu_startup_entry+0x14/0x18
[27559.540943]  cpu_startup_entry from rest_init+0x80/0x9c
[27559.546230]  rest_init from arch_post_acpi_subsys_init+0x0/0x18
[27559.552226] Code: e59421cc e59411a4 e282c001 e584c1cc (e7d02002)
[27559.558356] ---[ end trace 0000000000000000 ]---
[27559.563002] Kernel panic - not syncing: Fatal exception in interrupt

aspeed_i2c_master_irq+0x10c points to this line, with msg->buf being NULL.

I've seen it happen a few times, but it's not 100% reproducible, so I'm guessing it maybe a race condition of some sort.

zevweiss commented 1 year ago

Caught a maybe-related KASAN dump in a similar scenario (though I now think the IPMB device may in fact be unrelated):

[  614.857168] ==================================================================
[  614.864585] BUG: KASAN: slab-out-of-bounds in aspeed_i2c_master_irq+0x244/0x708
[  614.872150] Read of size 2 at addr 8443b6e4 by task swapper/0
[  614.878047] 
[  614.879645] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.19-67c9407-dirty-90ab623 #1
[  614.887447] Hardware name: Generic DT based system
[  614.892376]  unwind_backtrace from show_stack+0x18/0x1c
[  614.897895]  show_stack from dump_stack_lvl+0x24/0x2c
[  614.903245]  dump_stack_lvl from print_report+0x2bc/0x5f8
[  614.908919]  print_report from kasan_report+0xc4/0x108
[  614.914286]  kasan_report from aspeed_i2c_master_irq+0x244/0x708
[  614.920561]  aspeed_i2c_master_irq from aspeed_i2c_bus_irq+0x70/0x188
[  614.927251]  aspeed_i2c_bus_irq from __handle_irq_event_percpu+0x78/0x194
[  614.934313]  __handle_irq_event_percpu from handle_irq_event+0x50/0xb0
[  614.941076]  handle_irq_event from handle_simple_irq+0xc8/0x104
[  614.947248]  handle_simple_irq from generic_handle_domain_irq+0x40/0x50
[  614.954110]  generic_handle_domain_irq from aspeed_i2c_ic_irq_handler+0xe4/0x1bc
[  614.961809]  aspeed_i2c_ic_irq_handler from generic_handle_domain_irq+0x40/0x50
[  614.969395]  generic_handle_domain_irq from avic_handle_irq+0x50/0x74
[  614.976083]  avic_handle_irq from generic_handle_arch_irq+0x28/0x3c
[  614.982629]  generic_handle_arch_irq from call_with_stack+0x18/0x20
[  614.989153]  call_with_stack from __irq_svc+0x78/0x94
[  614.994423] Exception stack(0x81003f30 to 0x81003f78)
[  614.999627] 3f20:                                     8100ce00 00000000 00000000 00000000
[  615.007960] 3f40: 8100ce00 00000000 81005064 ffffffff 00000000 410fb767 00c5387d 00000000
[  615.016291] 3f60: 8100ce00 81003f80 801036b4 801036b8 60000013 ffffffff
[  615.023037]  __irq_svc from arch_cpu_idle+0x30/0x38
[  615.028151]  arch_cpu_idle from default_idle_call+0x34/0x7c
[  615.033977]  default_idle_call from do_idle+0x88/0xe4
[  615.039272]  do_idle from cpu_startup_entry+0x14/0x18
[  615.044548]  cpu_startup_entry from rest_init+0xa8/0xc4
[  615.050037]  rest_init from arch_post_acpi_subsys_init+0x0/0x18
[  615.056208] 
[  615.057807] Allocated by task 262:
[  615.061326]  seq_read_iter+0x424/0x83c
[  615.065275]  io_read+0x1b4/0x81c
[  615.068706]  io_issue_sqe+0x90/0x3ac
[  615.072495]  io_submit_sqes+0x3b8/0x9a8
[  615.076518]  sys_io_uring_enter+0x3fc/0xc6c
[  615.080887]  ret_fast_syscall+0x0/0x54
[  615.084808] 
[  615.086397] Freed by task 262:
[  615.089572]  kasan_set_free_info+0x20/0x34
[  615.093867]  __kasan_slab_free+0xe4/0x134
[  615.098049]  slab_free_freelist_hook+0x84/0x150
[  615.102794]  kfree+0x90/0x28c
[  615.105928]  seq_release+0x2c/0x48
[  615.109518]  kernfs_fop_release+0x64/0x110
[  615.113804]  __fput+0xd0/0x378
[  615.117037]  task_work_run+0x98/0xc8
[  615.120810]  do_work_pending+0x574/0x698
[  615.124927]  slow_work_pending+0xc/0x20
[  615.128926] 
[  615.130516] The buggy address belongs to the object at 8443a000
[  615.130516]  which belongs to the cache kmalloc-4k of size 4096
[  615.142475] The buggy address is located 1764 bytes to the right of
[  615.142475]  4096-byte region [8443a000, 8443b000)
[  615.153662] 
[  615.155255] The buggy address belongs to the physical page:
[  615.160936] page:343b2b83 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x84438
[  615.169022] head:343b2b83 order:3 compound_mapcount:0 compound_pincount:0
[  615.175953] flags: 0x10200(slab|head|zone=0)
[  615.180432] raw: 00010200 00000100 00000122 814019a0 00000000 00040004 ffffffff 00000001
[  615.188642] page dumped because: kasan: bad access detected
[  615.194326] 
[  615.195915] Memory state around the buggy address:
[  615.200831]  8443b580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  615.207511]  8443b600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  615.214193] >8443b680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  615.220840]                                                ^
[  615.226633]  8443b700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  615.233309]  8443b780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  615.239957] ==================================================================

aspeed_i2c_master_irq+0x244 points here.

Task 262, FWIW, is adcsensor, doing i2c-unrelated things.

zevweiss commented 1 year ago

Some further debug-experiment results: after disabling CONFIG_I2C_SLAVE I don't seem to be able to reproduce it (whereas I had been hitting it pretty reliably with that enabled).

ksrikanth commented 1 year ago

Hello @zevweiss any further updates on this.
We are also seeing similar issue, and have "I2C_SLAVE_ENABLED".

zevweiss commented 1 year ago

Hi @ksrikanth -- since I realized that the platform I'd been encountering this on didn't actually require CONFIG_I2C_SLAVE this became a much lower-priority issue for us and I haven't looked into further I'm afraid.