Open zevweiss opened 1 year ago
Caught a maybe-related KASAN dump in a similar scenario (though I now think the IPMB device may in fact be unrelated):
[ 614.857168] ==================================================================
[ 614.864585] BUG: KASAN: slab-out-of-bounds in aspeed_i2c_master_irq+0x244/0x708
[ 614.872150] Read of size 2 at addr 8443b6e4 by task swapper/0
[ 614.878047]
[ 614.879645] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.19-67c9407-dirty-90ab623 #1
[ 614.887447] Hardware name: Generic DT based system
[ 614.892376] unwind_backtrace from show_stack+0x18/0x1c
[ 614.897895] show_stack from dump_stack_lvl+0x24/0x2c
[ 614.903245] dump_stack_lvl from print_report+0x2bc/0x5f8
[ 614.908919] print_report from kasan_report+0xc4/0x108
[ 614.914286] kasan_report from aspeed_i2c_master_irq+0x244/0x708
[ 614.920561] aspeed_i2c_master_irq from aspeed_i2c_bus_irq+0x70/0x188
[ 614.927251] aspeed_i2c_bus_irq from __handle_irq_event_percpu+0x78/0x194
[ 614.934313] __handle_irq_event_percpu from handle_irq_event+0x50/0xb0
[ 614.941076] handle_irq_event from handle_simple_irq+0xc8/0x104
[ 614.947248] handle_simple_irq from generic_handle_domain_irq+0x40/0x50
[ 614.954110] generic_handle_domain_irq from aspeed_i2c_ic_irq_handler+0xe4/0x1bc
[ 614.961809] aspeed_i2c_ic_irq_handler from generic_handle_domain_irq+0x40/0x50
[ 614.969395] generic_handle_domain_irq from avic_handle_irq+0x50/0x74
[ 614.976083] avic_handle_irq from generic_handle_arch_irq+0x28/0x3c
[ 614.982629] generic_handle_arch_irq from call_with_stack+0x18/0x20
[ 614.989153] call_with_stack from __irq_svc+0x78/0x94
[ 614.994423] Exception stack(0x81003f30 to 0x81003f78)
[ 614.999627] 3f20: 8100ce00 00000000 00000000 00000000
[ 615.007960] 3f40: 8100ce00 00000000 81005064 ffffffff 00000000 410fb767 00c5387d 00000000
[ 615.016291] 3f60: 8100ce00 81003f80 801036b4 801036b8 60000013 ffffffff
[ 615.023037] __irq_svc from arch_cpu_idle+0x30/0x38
[ 615.028151] arch_cpu_idle from default_idle_call+0x34/0x7c
[ 615.033977] default_idle_call from do_idle+0x88/0xe4
[ 615.039272] do_idle from cpu_startup_entry+0x14/0x18
[ 615.044548] cpu_startup_entry from rest_init+0xa8/0xc4
[ 615.050037] rest_init from arch_post_acpi_subsys_init+0x0/0x18
[ 615.056208]
[ 615.057807] Allocated by task 262:
[ 615.061326] seq_read_iter+0x424/0x83c
[ 615.065275] io_read+0x1b4/0x81c
[ 615.068706] io_issue_sqe+0x90/0x3ac
[ 615.072495] io_submit_sqes+0x3b8/0x9a8
[ 615.076518] sys_io_uring_enter+0x3fc/0xc6c
[ 615.080887] ret_fast_syscall+0x0/0x54
[ 615.084808]
[ 615.086397] Freed by task 262:
[ 615.089572] kasan_set_free_info+0x20/0x34
[ 615.093867] __kasan_slab_free+0xe4/0x134
[ 615.098049] slab_free_freelist_hook+0x84/0x150
[ 615.102794] kfree+0x90/0x28c
[ 615.105928] seq_release+0x2c/0x48
[ 615.109518] kernfs_fop_release+0x64/0x110
[ 615.113804] __fput+0xd0/0x378
[ 615.117037] task_work_run+0x98/0xc8
[ 615.120810] do_work_pending+0x574/0x698
[ 615.124927] slow_work_pending+0xc/0x20
[ 615.128926]
[ 615.130516] The buggy address belongs to the object at 8443a000
[ 615.130516] which belongs to the cache kmalloc-4k of size 4096
[ 615.142475] The buggy address is located 1764 bytes to the right of
[ 615.142475] 4096-byte region [8443a000, 8443b000)
[ 615.153662]
[ 615.155255] The buggy address belongs to the physical page:
[ 615.160936] page:343b2b83 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x84438
[ 615.169022] head:343b2b83 order:3 compound_mapcount:0 compound_pincount:0
[ 615.175953] flags: 0x10200(slab|head|zone=0)
[ 615.180432] raw: 00010200 00000100 00000122 814019a0 00000000 00040004 ffffffff 00000001
[ 615.188642] page dumped because: kasan: bad access detected
[ 615.194326]
[ 615.195915] Memory state around the buggy address:
[ 615.200831] 8443b580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 615.207511] 8443b600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 615.214193] >8443b680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 615.220840] ^
[ 615.226633] 8443b700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 615.233309] 8443b780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 615.239957] ==================================================================
aspeed_i2c_master_irq+0x244
points here.
Task 262, FWIW, is adcsensor, doing i2c-unrelated things.
Some further debug-experiment results: after disabling CONFIG_I2C_SLAVE
I don't seem to be able to reproduce it (whereas I had been hitting it pretty reliably with that enabled).
Hello @zevweiss any further updates on this.
We are also seeing similar issue, and have "I2C_SLAVE_ENABLED".
Hi @ksrikanth -- since I realized that the platform I'd been encountering this on didn't actually require CONFIG_I2C_SLAVE
this became a much lower-priority issue for us and I haven't looked into further I'm afraid.
Running a dev-6.0 kernel on an ast2500 platform I'm working on a port to I'm sometimes hitting a panic on busses with an IPMB device on them:
aspeed_i2c_master_irq+0x10c points to this line, with
msg->buf
being NULL.I've seen it happen a few times, but it's not 100% reproducible, so I'm guessing it maybe a race condition of some sort.