Closed cpaasch closed 1 year ago
Bisected down to 182d2a98c6f185ba952ba6c3c36b9ab9c644f6a3
Fixed thanks to patches from @pabeni:
New patches for t/upstream:
Tests are now in progress:
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20230605T115910 https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20230605T115910
FYI: This is still happening in 74a1b717b26e5de48adc656c9a974ae518f0edaf
@cpaasch: could you please past the spat you see on top of https://github.com/multipath-tcp/mptcp_net-next/commit/74a1b717b26e5de48adc656c9a974ae518f0edaf ? Does it happen using the same repro? FTR, I can't reproduce it locally with such program, on top of the above tree
======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc4-g74a1b717b26e #28 Not tainted
------------------------------------------------------
syz-executor.1/26976 is trying to acquire lock:
ffff88806b65f1b0 (msk_lock-AF_INET){+.+.}-{0:0}, at: inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
but task is already holding lock:
ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump_listeners net/mptcp/mptcp_diag.c:98 [inline]
ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump+0x39a/0x820 net/mptcp/mptcp_diag.c:197
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&h->lhash2[i].lock){+.+.}-{2:2}:
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:350 [inline]
inet_unhash+0x16f/0x260 net/ipv4/inet_hashtables.c:780
tcp_set_state+0x17a/0x210 net/ipv4/tcp.c:2621
__mptcp_close_ssk+0x1bd/0x410 net/mptcp/protocol.c:2448
mptcp_destroy_common+0x88/0x1d0 net/mptcp/protocol.c:3338
mptcp_destroy+0x4b/0x70 net/mptcp/protocol.c:3365
__mptcp_destroy_sock+0x68/0x1a0 net/mptcp/protocol.c:2986
__mptcp_close+0x3c9/0x480 net/mptcp/protocol.c:3086
mptcp_close+0x28/0x100 net/mptcp/protocol.c:3101
inet_release+0x88/0xa0 net/ipv4/af_inet.c:426
sock_close+0x58/0x110 net/socket.c:653
__fput+0x1b0/0x440 fs/file_table.c:321
task_work_run+0x104/0x150 kernel/task_work.c:179
exit_to_user_mode_loop+0xc0/0xe0 include/linux/resume_user_mode.h:49
exit_to_user_mode_prepare+0x93/0xe0 kernel/entry/common.c:204
syscall_exit_to_user_mode+0x4c/0x1e0 kernel/entry/common.c:286
do_syscall_64+0x53/0xa0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc
-> #1 (k-sk_lock-AF_INET6/1){+.+.}-{0:0}:
lock_sock_nested+0x38/0xa0 net/core/sock.c:3469
mptcp_subflow_create_socket+0x85/0x590 net/mptcp/subflow.c:1671
__mptcp_nmpc_socket+0x9b/0x210 net/mptcp/protocol.c:89
mptcp_nl_cmd_add_addr+0x64a/0x890 net/mptcp/pm_netlink.c:1032
genl_rcv_msg+0x551/0x5f0 net/netlink/genetlink.c:968
netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
___sys_sendmsg+0x143/0x190 net/socket.c:2559
__x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
-> #0 (msk_lock-AF_INET){+.+.}-{0:0}:
__lock_acquire+0x14df/0x2ea0 kernel/locking/lockdep.c:3113
lock_acquire+0xdd/0x230 kernel/locking/lockdep.c:5705
lock_sock_fast include/net/sock.h:1733 [inline]
mptcp_diag_fill_info+0x1a0/0x240 net/mptcp/sockopt.c:929
inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
__inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
__netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
netlink_dump_start include/linux/netlink.h:329 [inline]
inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
sock_diag_rcv_msg+0xb5/0x1f0
netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
___sys_sendmsg+0x143/0x190 net/socket.c:2559
__x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
other info that might help us debug this:
Chain exists of:
msk_lock-AF_INET --> k-sk_lock-AF_INET6/1 --> &h->lhash2[i].lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&h->lhash2[i].lock);
lock(k-sk_lock-AF_INET6/1);
lock(&h->lhash2[i].lock);
lock(msk_lock-AF_INET);
*** DEADLOCK ***
6 locks held by syz-executor.1/26976:
#0: ffffffff83839c78 (sock_diag_mutex){+.+.}-{3:3}, at: sock_diag_rcv+0x1b/0x40 net/core/sock_diag.c:279
#1: ffffffff83839ba8 (sock_diag_table_mutex){+.+.}-{3:3}, at: sock_diag_rcv_msg+0x96/0x1f0 net/core/sock_diag.c:259
#2: ffff8880369d0e80 (nlk_cb_mutex-SOCK_DIAG){+.+.}-{3:3}, at: netlink_dump+0x4a/0x470 net/netlink/af_netlink.c:2215
#3: ffffffff83866448 (inet_diag_table_mutex){+.+.}-{3:3}, at: __inet_diag_dump+0xb1/0x190 net/ipv4/inet_diag.c:63
#4: ffffffff83759f38 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x5/0x40 include/linux/rcupdate.h:326
#5: ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
#5: ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump_listeners net/mptcp/mptcp_diag.c:98 [inline]
#5: ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump+0x39a/0x820 net/mptcp/mptcp_diag.c:197
stack backtrace:
CPU: 1 PID: 26976 Comm: syz-executor.1 Not tainted 6.4.0-rc4-g74a1b717b26e #28
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
check_noncircular+0xf0/0x100 kernel/locking/lockdep.c:2188
__lock_acquire+0x14df/0x2ea0 kernel/locking/lockdep.c:3113
lock_acquire+0xdd/0x230 kernel/locking/lockdep.c:5705
lock_sock_fast include/net/sock.h:1733 [inline]
mptcp_diag_fill_info+0x1a0/0x240 net/mptcp/sockopt.c:929
inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
__inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
__netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
netlink_dump_start include/linux/netlink.h:329 [inline]
inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
sock_diag_rcv_msg+0xb5/0x1f0
netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
___sys_sendmsg+0x143/0x190 net/socket.c:2559
__x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f9b8ccde6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f9b8c00bcd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007f9b8ccde6a9
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
</TASK>
BUG: sleeping function called from invalid context at net/core/sock.c:3501
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 26976, name: syz-executor.1
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 0
INFO: lockdep is turned off.
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 26976 Comm: syz-executor.1 Not tainted 6.4.0-rc4-g74a1b717b26e #28
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
__might_resched+0x248/0x260 kernel/sched/core.c:10153
__lock_sock_fast+0x21/0x80 net/core/sock.c:3501
lock_sock_fast include/net/sock.h:1735 [inline]
mptcp_diag_fill_info+0x1ac/0x240 net/mptcp/sockopt.c:929
inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
__inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
__netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
netlink_dump_start include/linux/netlink.h:329 [inline]
inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
sock_diag_rcv_msg+0xb5/0x1f0
netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
___sys_sendmsg+0x143/0x190 net/socket.c:2559
__x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f9b8ccde6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f9b8c00bcd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007f9b8ccde6a9
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
</TASK>
__nla_validate_parse: 12 callbacks suppressed
netlink: 28 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 28 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 28 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.0'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 40 bytes leftover after parsing attributes in process `syz-executor.6'.
netlink: 40 bytes leftover after parsing attributes in process `syz-executor.6'.
netlink: 40 bytes leftover after parsing attributes in process `syz-executor.6'.
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
netlink: 'syz-executor.6': attribute type 14 has an invalid length.
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
netlink: 'syz-executor.6': attribute type 14 has an invalid length.
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
pimreg: entered allmulticast mode
pimreg: left allmulticast mode
pimreg: entered allmulticast mode
pimreg: left allmulticast mode
pimreg: entered allmulticast mode
sctp: [Deprecated]: syz-executor.7 (pid 27657) Use of int in max_burst socket option.
Use struct sctp_assoc_value instead
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
sctp: [Deprecated]: syz-executor.7 (pid 27657) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
sctp: [Deprecated]: syz-executor.7 (pid 27684) Use of int in max_burst socket option.
Use struct sctp_assoc_value instead
sctp: [Deprecated]: syz-executor.7 (pid 27706) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
sctp: [Deprecated]: syz-executor.7 (pid 27732) Use of int in max_burst socket option.
Use struct sctp_assoc_value instead
netlink: 'syz-executor.5': attribute type 1 has an invalid length.
__nla_validate_parse: 1 callbacks suppressed
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 'syz-executor.5': attribute type 1 has an invalid length.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
sctp: [Deprecated]: syz-executor.7 (pid 27739) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
program syz-executor.7 is using a deprecated SCSI ioctl, please convert it to SG_IO
program syz-executor.7 is using a deprecated SCSI ioctl, please convert it to SG_IO
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.1'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.4'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.4'.
nft_compat: unsupported protocol 0
nft_compat: unsupported protocol 0
program syz-executor.5 is using a deprecated SCSI ioctl, please convert it to SG_IO
program syz-executor.5 is using a deprecated SCSI ioctl, please convert it to SG_IO
program syz-executor.5 is using a deprecated SCSI ioctl, please convert it to SG_IO
Just tried the reproducer, and it does not work anymore. So, will let syzkaller generate a new one.
Let me know if you want a new github issue.
@cpaasch: thanks for the quick feedback!
It looks like the problem is that mptcp move the listener status to close before actually stopping the listener - so the state check in diag is fooled.
Could you please try the attached patch ? (only build tested here ;) diffs.txt
Btw., running with the above diff, I still got a inet_sk_diag_fill
error. I'm applying yours from the email and relaunch.
Syzkaller-id: f3e04a94f3147010ed1eef99859a597b7ead3a1a
HEAD: 7639ac56dfbc
Trace:
syz-repro:
C-repro: repro.c.txt
Kconfig: Kconfig_k5_lockdep.txt