Closed cpaasch closed 12 months ago
============================================ WARNING: possible recursive locking detected 6.6.0-rc4-g7a5720a344e7 #57 Not tainted -------------------------------------------- syz-executor.4/9364 is trying to acquire lock: ffff8880171052f0 (k-slock-AF_INET6){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880171052f0 (k-slock-AF_INET6){+.-.}-{2:2}, at: sk_clone_lock+0x129/0x6c0 net/core/sock.c:2310 but task is already holding lock: ffff88805f3750b0 (k-slock-AF_INET6){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88805f3750b0 (k-slock-AF_INET6){+.-.}-{2:2}, at: sk_clone_lock+0x129/0x6c0 net/core/sock.c:2310
[...]
__do_softirq+0x158/0x3e6 kernel/softirq.c:553 do_softirq+0x8b/0xd0 kernel/softirq.c:454
__local_bh_enable_ip+0x10c/0x120 kernel/softirq.c:381 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:819 [inline] __dev_queue_xmit+0x107c/0x1c80 net/core/dev.c:4371 dev_queue_xmit include/linux/netdevice.h:3092 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0x693/0x850 net/ipv4/ip_output.c:233 __ip_queue_xmit+0x8dc/0xa00 net/ipv4/ip_output.c:533 __tcp_transmit_skb+0xdf5/0x1000 net/ipv4/tcp_output.c:1408 tcp_rcv_state_process+0x16ed/0x1750 net/ipv4/tcp_input.c:6383 tcp_v4_do_rcv+0x3cc/0x610 net/ipv4/tcp_ipv4.c:1752 __release_sock+0xcf/0x150 net/core/sock.c:2976 release_sock+0x38/0x110 net/core/sock.c:3518 [...] ```
The issue does not look like mptcp related (or memory corruption happens before the reported splat) because the above quoted stacktrace looks buggy: release_sock() disables local BH processing, the rcu_read_lock_bh()/rcu_read_unlock_bh() pair in dev_queue_xmit() should not lead to irq processing, but indeed do_softirq() is invoked.
it looks like current->preempt_count is corrupted in between such calls?!? I think the lock debug options enabled in the current build should trigger if we have a pair of locking primitives mismatched ?!?
All in all it looks confusing, and IIRC this is the 2nd stack trace we see in recent time that could be possibly a side effect of some random memory corruption. Perhaps it would be worthy increase the number of runners/configs with kasan enabled (and ev decreasing the number of config without such option, as needed)?!?
Have you observed similar splats without any mptcp reference?
I'm updating all my instances with KASAN. Any other "Memory Debugging" options you think may help ?
I do wonder if the fail-injections could play a role here ?
And no - I don't see other similar splats.
@cpaasch: unless you observed more instances of this one I suggest to close it
Yes! Hasn't happened since > a month.
syzkaller-ID: 664c5311cf44e3aa732eaa6b2e79eb4a8961ec08
HEAD: 7a5720a344e7
Trace:
Kconfig: Kconfig_k5_lockdep.txt
No reproducer yet.