multipath-tcp / mptcp_net-next

Development version of the Upstream MultiPath TCP Linux kernel 🐧
https://mptcp.dev
Other
284 stars 42 forks source link

syzkaller: possible deadlock in `inet_sk_diag_fill` #407

Closed cpaasch closed 1 year ago

cpaasch commented 1 year ago

Syzkaller-id: f3e04a94f3147010ed1eef99859a597b7ead3a1a

HEAD: 7639ac56dfbc

Trace:

======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc3-g7639ac56dfbc #27 Not tainted
------------------------------------------------------
syz-executor.1/15345 is trying to acquire lock:
ffff888036038c70 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342

but task is already holding lock:
ffff8880064c3198 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
ffff8880064c3198 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump_listeners net/mptcp/mptcp_diag.c:98 [inline]
ffff8880064c3198 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump+0x39a/0x820 net/mptcp/mptcp_diag.c:197

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&h->lhash2[i].lock){+.+.}-{2:2}:
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:350 [inline]
       __inet_hash+0x157/0x4d0 net/ipv4/inet_hashtables.c:735
       inet_csk_listen_start+0xf6/0x130 net/ipv4/inet_connection_sock.c:1258
       inet_listen+0x136/0x210 net/ipv4/af_inet.c:227
       __x64_sys_listen+0xb4/0x110 net/socket.c:1837
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc

-> #0 (sk_lock-AF_INET){+.+.}-{0:0}:
       __lock_acquire+0x14d3/0x2f60 kernel/locking/lockdep.c:3108
       lock_acquire+0xdd/0x230 kernel/locking/lockdep.c:5691
       lock_sock_fast include/net/sock.h:1733 [inline]
       mptcp_diag_fill_info+0x168/0x200 net/mptcp/sockopt.c:926
       inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
       mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
       __inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
       inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
       netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
       __netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
       netlink_dump_start include/linux/netlink.h:329 [inline]
       inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
       sock_diag_rcv_msg+0xb5/0x1f0
       netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
       sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
       netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
       netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
       ____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
       ___sys_sendmsg+0x143/0x190 net/socket.c:2559
       __x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&h->lhash2[i].lock);
                               lock(sk_lock-AF_INET);
                               lock(&h->lhash2[i].lock);
  lock(sk_lock-AF_INET);

 *** DEADLOCK ***

6 locks held by syz-executor.1/15345:
 #0: ffffffff83839c78 (sock_diag_mutex){+.+.}-{3:3}, at: sock_diag_rcv+0x1b/0x40 net/core/sock_diag.c:279
 #1: ffffffff83839ba8 (sock_diag_table_mutex){+.+.}-{3:3}, at: sock_diag_rcv_msg+0x96/0x1f0 net/core/sock_diag.c:259
 #2: ffff888039be8680 (nlk_cb_mutex-SOCK_DIAG){+.+.}-{3:3}, at: netlink_dump+0x4a/0x470 net/netlink/af_netlink.c:2215
 #3: ffffffff83866438 (inet_diag_table_mutex){+.+.}-{3:3}, at: __inet_diag_dump+0xb1/0x190 net/ipv4/inet_diag.c:63
 #4: ffffffff83759f38 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x5/0x40 include/linux/rcupdate.h:326
 #5: ffff8880064c3198 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
 #5: ffff8880064c3198 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump_listeners net/mptcp/mptcp_diag.c:98 [inline]
 #5: ffff8880064c3198 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump+0x39a/0x820 net/mptcp/mptcp_diag.c:197

stack backtrace:
CPU: 1 PID: 15345 Comm: syz-executor.1 Not tainted 6.4.0-rc3-g7639ac56dfbc #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
 check_noncircular+0xf0/0x100 kernel/locking/lockdep.c:2188
 __lock_acquire+0x14d3/0x2f60 kernel/locking/lockdep.c:3108
 lock_acquire+0xdd/0x230 kernel/locking/lockdep.c:5691
 lock_sock_fast include/net/sock.h:1733 [inline]
 mptcp_diag_fill_info+0x168/0x200 net/mptcp/sockopt.c:926
 inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
 mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
 __inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
 inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
 netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
 __netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
 netlink_dump_start include/linux/netlink.h:329 [inline]
 inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
 sock_diag_rcv_msg+0xb5/0x1f0
 netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
 netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
 netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
 ____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
 ___sys_sendmsg+0x143/0x190 net/socket.c:2559
 __x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f3eb09d36a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f3eafd00cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007f3eb09d36a9
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>
BUG: sleeping function called from invalid context at net/core/sock.c:3501
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 15345, name: syz-executor.1
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 0
INFO: lockdep is turned off.
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 15345 Comm: syz-executor.1 Not tainted 6.4.0-rc3-g7639ac56dfbc #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
 __might_resched+0x248/0x260 kernel/sched/core.c:10153
 __lock_sock_fast+0x21/0x80 net/core/sock.c:3501
 lock_sock_fast include/net/sock.h:1735 [inline]
 mptcp_diag_fill_info+0x174/0x200 net/mptcp/sockopt.c:926
 inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
 mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
 __inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
 inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
 netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
 __netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
 netlink_dump_start include/linux/netlink.h:329 [inline]
 inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
 sock_diag_rcv_msg+0xb5/0x1f0
 netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
 netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
 netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
 ____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
 ___sys_sendmsg+0x143/0x190 net/socket.c:2559
 __x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f3eb09d36a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f3eafd00cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007f3eb09d36a9
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>

syz-repro:

# {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: SandboxArg:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:false UseTmpDir:false HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false Fault:false FaultCall:0 FaultNth:0}}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
listen(r0, 0x0)
r1 = socket$nl_sock_diag(0x10, 0x3, 0x4)
sendmsg$TCPDIAG_GETSOCK(r1, &(0x7f00000000c0)={0x0, 0x0, &(0x7f00000001c0)={&(0x7f0000000040)={0x54, 0x13, 0x203, 0x0, 0x0, {0x0, 0x0, 0x0, 0x3, {}, 0x751f}, [@INET_DIAG_REQ_BYTECODE={0x8, 0x3, "06010000"}]}, 0x54}}, 0x0)

C-repro: repro.c.txt

Kconfig: Kconfig_k5_lockdep.txt

cpaasch commented 1 year ago

Bisected down to 182d2a98c6f185ba952ba6c3c36b9ab9c644f6a3

matttbe commented 1 year ago

Fixed thanks to patches from @pabeni:

New patches for t/upstream:

Tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20230605T115910 https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20230605T115910

cpaasch commented 1 year ago

FYI: This is still happening in 74a1b717b26e5de48adc656c9a974ae518f0edaf

pabeni commented 1 year ago

@cpaasch: could you please past the spat you see on top of https://github.com/multipath-tcp/mptcp_net-next/commit/74a1b717b26e5de48adc656c9a974ae518f0edaf ? Does it happen using the same repro? FTR, I can't reproduce it locally with such program, on top of the above tree

cpaasch commented 1 year ago
======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc4-g74a1b717b26e #28 Not tainted
------------------------------------------------------
syz-executor.1/26976 is trying to acquire lock:
ffff88806b65f1b0 (msk_lock-AF_INET){+.+.}-{0:0}, at: inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342

but task is already holding lock:
ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump_listeners net/mptcp/mptcp_diag.c:98 [inline]
ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump+0x39a/0x820 net/mptcp/mptcp_diag.c:197

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&h->lhash2[i].lock){+.+.}-{2:2}:
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:350 [inline]
       inet_unhash+0x16f/0x260 net/ipv4/inet_hashtables.c:780
       tcp_set_state+0x17a/0x210 net/ipv4/tcp.c:2621
       __mptcp_close_ssk+0x1bd/0x410 net/mptcp/protocol.c:2448
       mptcp_destroy_common+0x88/0x1d0 net/mptcp/protocol.c:3338
       mptcp_destroy+0x4b/0x70 net/mptcp/protocol.c:3365
       __mptcp_destroy_sock+0x68/0x1a0 net/mptcp/protocol.c:2986
       __mptcp_close+0x3c9/0x480 net/mptcp/protocol.c:3086
       mptcp_close+0x28/0x100 net/mptcp/protocol.c:3101
       inet_release+0x88/0xa0 net/ipv4/af_inet.c:426
       sock_close+0x58/0x110 net/socket.c:653
       __fput+0x1b0/0x440 fs/file_table.c:321
       task_work_run+0x104/0x150 kernel/task_work.c:179
       exit_to_user_mode_loop+0xc0/0xe0 include/linux/resume_user_mode.h:49
       exit_to_user_mode_prepare+0x93/0xe0 kernel/entry/common.c:204
       syscall_exit_to_user_mode+0x4c/0x1e0 kernel/entry/common.c:286
       do_syscall_64+0x53/0xa0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x72/0xdc

-> #1 (k-sk_lock-AF_INET6/1){+.+.}-{0:0}:
       lock_sock_nested+0x38/0xa0 net/core/sock.c:3469
       mptcp_subflow_create_socket+0x85/0x590 net/mptcp/subflow.c:1671
       __mptcp_nmpc_socket+0x9b/0x210 net/mptcp/protocol.c:89
       mptcp_nl_cmd_add_addr+0x64a/0x890 net/mptcp/pm_netlink.c:1032
       genl_rcv_msg+0x551/0x5f0 net/netlink/genetlink.c:968
       netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
       netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
       netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
       ____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
       ___sys_sendmsg+0x143/0x190 net/socket.c:2559
       __x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc

-> #0 (msk_lock-AF_INET){+.+.}-{0:0}:
       __lock_acquire+0x14df/0x2ea0 kernel/locking/lockdep.c:3113
       lock_acquire+0xdd/0x230 kernel/locking/lockdep.c:5705
       lock_sock_fast include/net/sock.h:1733 [inline]
       mptcp_diag_fill_info+0x1a0/0x240 net/mptcp/sockopt.c:929
       inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
       mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
       __inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
       inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
       netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
       __netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
       netlink_dump_start include/linux/netlink.h:329 [inline]
       inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
       sock_diag_rcv_msg+0xb5/0x1f0
       netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
       sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
       netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
       netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
       ____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
       ___sys_sendmsg+0x143/0x190 net/socket.c:2559
       __x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc

other info that might help us debug this:

Chain exists of:
  msk_lock-AF_INET --> k-sk_lock-AF_INET6/1 --> &h->lhash2[i].lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&h->lhash2[i].lock);
                               lock(k-sk_lock-AF_INET6/1);
                               lock(&h->lhash2[i].lock);
  lock(msk_lock-AF_INET);

 *** DEADLOCK ***

6 locks held by syz-executor.1/26976:
 #0: ffffffff83839c78 (sock_diag_mutex){+.+.}-{3:3}, at: sock_diag_rcv+0x1b/0x40 net/core/sock_diag.c:279
 #1: ffffffff83839ba8 (sock_diag_table_mutex){+.+.}-{3:3}, at: sock_diag_rcv_msg+0x96/0x1f0 net/core/sock_diag.c:259
 #2: ffff8880369d0e80 (nlk_cb_mutex-SOCK_DIAG){+.+.}-{3:3}, at: netlink_dump+0x4a/0x470 net/netlink/af_netlink.c:2215
 #3: ffffffff83866448 (inet_diag_table_mutex){+.+.}-{3:3}, at: __inet_diag_dump+0xb1/0x190 net/ipv4/inet_diag.c:63
 #4: ffffffff83759f38 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x5/0x40 include/linux/rcupdate.h:326
 #5: ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
 #5: ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump_listeners net/mptcp/mptcp_diag.c:98 [inline]
 #5: ffff8880064503b0 (&h->lhash2[i].lock){+.+.}-{2:2}, at: mptcp_diag_dump+0x39a/0x820 net/mptcp/mptcp_diag.c:197

stack backtrace:
CPU: 1 PID: 26976 Comm: syz-executor.1 Not tainted 6.4.0-rc4-g74a1b717b26e #28
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
 check_noncircular+0xf0/0x100 kernel/locking/lockdep.c:2188
 __lock_acquire+0x14df/0x2ea0 kernel/locking/lockdep.c:3113
 lock_acquire+0xdd/0x230 kernel/locking/lockdep.c:5705
 lock_sock_fast include/net/sock.h:1733 [inline]
 mptcp_diag_fill_info+0x1a0/0x240 net/mptcp/sockopt.c:929
 inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
 mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
 __inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
 inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
 netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
 __netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
 netlink_dump_start include/linux/netlink.h:329 [inline]
 inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
 sock_diag_rcv_msg+0xb5/0x1f0
 netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
 netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
 netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
 ____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
 ___sys_sendmsg+0x143/0x190 net/socket.c:2559
 __x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f9b8ccde6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f9b8c00bcd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007f9b8ccde6a9
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>
BUG: sleeping function called from invalid context at net/core/sock.c:3501
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 26976, name: syz-executor.1
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 0
INFO: lockdep is turned off.
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 26976 Comm: syz-executor.1 Not tainted 6.4.0-rc4-g74a1b717b26e #28
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xad/0xf0 lib/dump_stack.c:106
 __might_resched+0x248/0x260 kernel/sched/core.c:10153
 __lock_sock_fast+0x21/0x80 net/core/sock.c:3501
 lock_sock_fast include/net/sock.h:1735 [inline]
 mptcp_diag_fill_info+0x1ac/0x240 net/mptcp/sockopt.c:929
 inet_sk_diag_fill+0x784/0xa60 net/ipv4/inet_diag.c:342
 mptcp_diag_dump+0x5c3/0x820 net/mptcp/mptcp_diag.c:24
 __inet_diag_dump+0xf8/0x190 net/ipv4/inet_diag.c:1179
 inet_diag_dump_compat+0xba/0xe0 net/ipv4/inet_diag.c:1287
 netlink_dump+0x208/0x470 net/netlink/af_netlink.c:2268
 __netlink_dump_start+0x2bb/0x380 net/netlink/af_netlink.c:2373
 netlink_dump_start include/linux/netlink.h:329 [inline]
 inet_diag_rcv_msg_compat+0x108/0x1e0 net/ipv4/inet_diag.c:1321
 sock_diag_rcv_msg+0xb5/0x1f0
 netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2546
 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
 netlink_unicast+0x3c2/0x490 net/netlink/af_netlink.c:1339
 netlink_sendmsg+0x4fa/0x620 net/netlink/af_netlink.c:1913
 ____sys_sendmsg+0x3a4/0x3d0 net/socket.c:724
 ___sys_sendmsg+0x143/0x190 net/socket.c:2559
 __x64_sys_sendmsg+0xd5/0x150 net/socket.c:2588
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f9b8ccde6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f9b8c00bcd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007f9b8ccde6a9
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>
__nla_validate_parse: 12 callbacks suppressed
netlink: 28 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 28 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 28 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.0'.
netlink: 36 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 40 bytes leftover after parsing attributes in process `syz-executor.6'.
netlink: 40 bytes leftover after parsing attributes in process `syz-executor.6'.
netlink: 40 bytes leftover after parsing attributes in process `syz-executor.6'.
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
netlink: 'syz-executor.6': attribute type 14 has an invalid length.
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
netlink: 'syz-executor.6': attribute type 14 has an invalid length.
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
pimreg: entered allmulticast mode
pimreg: left allmulticast mode
pimreg: entered allmulticast mode
pimreg: left allmulticast mode
pimreg: entered allmulticast mode
sctp: [Deprecated]: syz-executor.7 (pid 27657) Use of int in max_burst socket option.
Use struct sctp_assoc_value instead
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
sctp: [Deprecated]: syz-executor.7 (pid 27657) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
sctp: [Deprecated]: syz-executor.7 (pid 27684) Use of int in max_burst socket option.
Use struct sctp_assoc_value instead
sctp: [Deprecated]: syz-executor.7 (pid 27706) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
netlink: 'syz-executor.1': attribute type 14 has an invalid length.
sctp: [Deprecated]: syz-executor.7 (pid 27732) Use of int in max_burst socket option.
Use struct sctp_assoc_value instead
netlink: 'syz-executor.5': attribute type 1 has an invalid length.
__nla_validate_parse: 1 callbacks suppressed
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 'syz-executor.5': attribute type 1 has an invalid length.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
sctp: [Deprecated]: syz-executor.7 (pid 27739) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
program syz-executor.7 is using a deprecated SCSI ioctl, please convert it to SG_IO
program syz-executor.7 is using a deprecated SCSI ioctl, please convert it to SG_IO
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.1'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.5'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.4'.
netlink: 4 bytes leftover after parsing attributes in process `syz-executor.4'.
nft_compat: unsupported protocol 0
nft_compat: unsupported protocol 0
program syz-executor.5 is using a deprecated SCSI ioctl, please convert it to SG_IO
program syz-executor.5 is using a deprecated SCSI ioctl, please convert it to SG_IO
program syz-executor.5 is using a deprecated SCSI ioctl, please convert it to SG_IO

Just tried the reproducer, and it does not work anymore. So, will let syzkaller generate a new one.

Let me know if you want a new github issue.

pabeni commented 1 year ago

@cpaasch: thanks for the quick feedback!

It looks like the problem is that mptcp move the listener status to close before actually stopping the listener - so the state check in diag is fooled.

Could you please try the attached patch ? (only build tested here ;) diffs.txt

cpaasch commented 1 year ago

Btw., running with the above diff, I still got a inet_sk_diag_fill error. I'm applying yours from the email and relaunch.