multipath-tcp / mptcp_net-next

Development version of the Upstream MultiPath TCP Linux kernel 🐧
https://mptcp.dev
Other
284 stars 42 forks source link

TFO: possible circular locking dependency detected #322

Closed matttbe closed 1 year ago

matttbe commented 1 year ago

The CI just reported a possible circular locking dependency when validating the TFO packetdrill tests:

# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN-no-cookie-rejected.pkt (ipv4-mapped-v6)]
[ 1985.742511][T14278] 
[ 1985.746319][T14278] ======================================================
[ 1985.757838][T14278] WARNING: possible circular locking dependency detected
[ 1985.769518][T14278] 6.1.0-rc7-g43fbd526856d #1 Tainted: G                 N
[ 1985.780585][T14278] ------------------------------------------------------
[ 1985.802766][T14278] packetdrill/14278 is trying to acquire lock:
[ 1985.813538][T14278] ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069) 
[ 1985.833109][T14278] 
[ 1985.833109][T14278] but task is already holding lock:
[ 1985.844760][T14278] ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973) 
[ 1985.858511][T14278] 
[ 1985.858511][T14278] which lock already depends on the new lock.
[ 1985.858511][T14278] 
[ 1985.878468][T14278] 
[ 1985.878468][T14278] the existing dependency chain (in reverse order) is:
[ 1985.894685][T14278] 
[ 1985.894685][T14278] -> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
[ 1985.908443][T14278]        __lock_acquire (kernel/locking/lockdep.c:5055) 
[ 1985.916456][T14278]        lock_acquire (kernel/locking/lockdep.c:466) 
[ 1985.924348][T14278]        lock_sock_nested (net/core/sock.c:3463) 
[ 1985.933784][T14278]        mptcp_worker (net/mptcp/protocol.c:2614) 
[ 1985.941105][T14278]        process_one_work (kernel/workqueue.c:2294) 
[ 1985.947961][T14278]        worker_thread (include/linux/list.h:292) 
[ 1985.954871][T14278]        kthread (kernel/kthread.c:376) 
[ 1985.960673][T14278]        ret_from_fork (arch/x86/entry/entry_64.S:312) 
[ 1985.970890][T14278] 
[ 1985.970890][T14278] -> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
[ 1985.986367][T14278]        check_prev_add (kernel/locking/lockdep.c:3098) 
[ 1985.996329][T14278]        validate_chain (kernel/locking/lockdep.c:3217) 
[ 1986.003992][T14278]        __lock_acquire (kernel/locking/lockdep.c:5055) 
[ 1986.013270][T14278]        lock_acquire (kernel/locking/lockdep.c:466) 
[ 1986.022344][T14278]        __flush_work (kernel/workqueue.c:3070) 
[ 1986.030047][T14278]        __cancel_work_timer (kernel/workqueue.c:3160) 
[ 1986.040140][T14278]        mptcp_cancel_work (net/mptcp/protocol.c:2758) 
[ 1986.048326][T14278]        mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817) 
[ 1986.060954][T14278]        __mptcp_close_ssk (net/mptcp/protocol.c:2363) 
[ 1986.069942][T14278]        mptcp_destroy_common (net/mptcp/protocol.c:3170) 
[ 1986.079146][T14278]        mptcp_destroy (include/net/sock.h:1495) 
[ 1986.091376][T14278]        __mptcp_destroy_sock (net/mptcp/protocol.c:2886) 
[ 1986.101378][T14278]        __mptcp_close (net/mptcp/protocol.c:2959) 
[ 1986.113451][T14278]        mptcp_close (net/mptcp/protocol.c:2974) 
[ 1986.124132][T14278]        inet_release (net/ipv4/af_inet.c:432) 
[ 1986.135696][T14278]        __sock_release (net/socket.c:651) 
[ 1986.151118][T14278]        sock_close (net/socket.c:1367) 
[ 1986.159359][T14278]        __fput (fs/file_table.c:320) 
[ 1986.171807][T14278]        task_work_run (kernel/task_work.c:181 (discriminator 1)) 
[ 1986.186314][T14278]        exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49) 
[ 1986.201600][T14278]        syscall_exit_to_user_mode (kernel/entry/common.c:130) 
[ 1986.213903][T14278]        do_syscall_64 (arch/x86/entry/common.c:87) 
[ 1986.226605][T14278]        entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 
[ 1986.242860][T14278] 
[ 1986.242860][T14278] other info that might help us debug this:
[ 1986.242860][T14278] 
[ 1986.269178][T14278]  Possible unsafe locking scenario:
[ 1986.269178][T14278] 
[ 1986.286683][T14278]        CPU0                    CPU1
[ 1986.296958][T14278]        ----                    ----
[ 1986.309290][T14278]   lock(sk_lock-AF_INET);
[ 1986.318020][T14278]                                lock((work_completion)(&msk->work));
[ 1986.331800][T14278]                                lock(sk_lock-AF_INET);
[ 1986.349420][T14278]   lock((work_completion)(&msk->work));
[ 1986.358816][T14278] 
[ 1986.358816][T14278]  *** DEADLOCK ***
[ 1986.358816][T14278] 
[ 1986.374462][T14278] 2 locks held by packetdrill/14278:
[ 1986.382405][T14278]  #0: ffff88800ab35e10 (&sb->s_type->i_mutex_key#6){+.+.}-{3:3}, at: __sock_release (net/socket.c:650) 
[ 1986.400574][T14278]  #1: ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973) 
[ 1986.416278][T14278] 
[ 1986.416278][T14278] stack backtrace:
[ 1986.428497][T14278] CPU: 0 PID: 14278 Comm: packetdrill Tainted: G                 N 6.1.0-rc7-g43fbd526856d #1
[ 1986.445664][T14278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1986.461368][T14278] Call Trace:
[ 1986.468043][T14278]  <TASK>
[ 1986.473273][T14278]  dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 1986.479333][T14278]  check_noncircular (kernel/locking/lockdep.c:2177) 
[ 1986.486315][T14278]  ? print_circular_bug (kernel/locking/lockdep.c:2156) 
[ 1986.495356][T14278]  ? unwind_get_return_address (arch/x86/kernel/unwind_orc.c:323) 
[ 1986.503097][T14278]  ? write_profile (kernel/stacktrace.c:83) 
[ 1986.509515][T14278]  ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26) 
[ 1986.517980][T14278]  ? alloc_chain_hlocks (kernel/locking/lockdep.c:3483) 
[ 1986.525543][T14278]  check_prev_add (kernel/locking/lockdep.c:3098) 
[ 1986.532467][T14278]  validate_chain (kernel/locking/lockdep.c:3217) 
[ 1986.540949][T14278]  ? check_prev_add (kernel/locking/lockdep.c:3787) 
[ 1986.548738][T14278]  ? check_prev_add (kernel/locking/lockdep.c:3177) 
[ 1986.556497][T14278]  __lock_acquire (kernel/locking/lockdep.c:5055) 
[ 1986.562567][T14278]  lock_acquire (kernel/locking/lockdep.c:466) 
[ 1986.569100][T14278]  ? __flush_work (kernel/workqueue.c:3069) 
[ 1986.577067][T14278]  ? rcu_read_unlock (include/linux/rcupdate.h:779 (discriminator 5)) 
[ 1986.584530][T14278]  ? mark_lock.part.0 (arch/x86/include/asm/bitops.h:228) 
[ 1986.592457][T14278]  ? validate_chain (arch/x86/include/asm/bitops.h:228) 
[ 1986.598563][T14278]  ? check_prev_add (kernel/locking/lockdep.c:3787) 
[ 1986.605824][T14278]  __flush_work (kernel/workqueue.c:3070) 
[ 1986.612344][T14278]  ? __flush_work (kernel/workqueue.c:3069) 
[ 1986.619191][T14278]  ? start_flush_work (kernel/workqueue.c:3060) 
[ 1986.626569][T14278]  ? mark_lock.part.0 (arch/x86/include/asm/bitops.h:228) 
[ 1986.632787][T14278]  ? mark_lock_irq (kernel/locking/lockdep.c:4593) 
[ 1986.639479][T14278]  ? __lock_acquire (kernel/locking/lockdep.c:5055) 
[ 1986.646269][T14278]  ? mark_held_locks (kernel/locking/lockdep.c:4236) 
[ 1986.652695][T14278]  ? lockdep_hardirqs_on_prepare.part.0 (kernel/locking/lockdep.c:466) 
[ 1986.663080][T14278]  __cancel_work_timer (kernel/workqueue.c:3160) 
[ 1986.670780][T14278]  ? cancel_delayed_work (kernel/workqueue.c:3113) 
[ 1986.680137][T14278]  ? mark_held_locks (kernel/locking/lockdep.c:4236) 
[ 1986.687234][T14278]  mptcp_cancel_work (net/mptcp/protocol.c:2758) 
[ 1986.696026][T14278]  mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817) 
[ 1986.706254][T14278]  __mptcp_close_ssk (net/mptcp/protocol.c:2363) 
[ 1986.712975][T14278]  mptcp_destroy_common (net/mptcp/protocol.c:3170) 
[ 1986.722027][T14278]  ? __destroy_inode (fs/inode.c:287 (discriminator 2)) 
[ 1986.731550][T14278]  mptcp_destroy (include/net/sock.h:1495) 
[ 1986.740600][T14278]  __mptcp_destroy_sock (net/mptcp/protocol.c:2886) 
[ 1986.748987][T14278]  __mptcp_close (net/mptcp/protocol.c:2959) 
[ 1986.754015][T14278]  mptcp_close (net/mptcp/protocol.c:2974) 
[ 1986.761663][T14278]  inet_release (net/ipv4/af_inet.c:432) 
[ 1986.768726][T14278]  __sock_release (net/socket.c:651) 
[ 1986.778062][T14278]  sock_close (net/socket.c:1367) 
[ 1986.785604][T14278]  __fput (fs/file_table.c:320) 
[ 1986.789836][T14278]  task_work_run (kernel/task_work.c:181 (discriminator 1)) 
[ 1986.796078][T14278]  ? task_work_func_match (kernel/task_work.c:44) 
[ 1986.805052][T14278]  ? task_work_cancel (kernel/task_work.c:147) 
[ 1986.811285][T14278]  exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49) 
[ 1986.818461][T14278]  syscall_exit_to_user_mode (kernel/entry/common.c:130) 
[ 1986.825920][T14278]  do_syscall_64 (arch/x86/entry/common.c:87) 
[ 1986.831436][T14278]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 
[ 1986.838430][T14278] RIP: 0033:0x7f9560fc813b
[ 1986.844447][T14278] Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 43 b9 f7 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 91 b9 f7 ff 8b 44
All code
========
   0:   03 00                   add    (%rax),%eax
   2:   00 00                   add    %al,(%rax)
   4:   0f 05                   syscall 
   6:   48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
   c:   77 41                   ja     0x4f
   e:   c3                      ret    
   f:   48 83 ec 18             sub    $0x18,%rsp
  13:   89 7c 24 0c             mov    %edi,0xc(%rsp)
  17:   e8 43 b9 f7 ff          call   0xfffffffffff7b95f
  1c:   8b 7c 24 0c             mov    0xc(%rsp),%edi
  20:   41 89 c0                mov    %eax,%r8d
  23:   b8 03 00 00 00          mov    $0x3,%eax
  28:   0f 05                   syscall 
  2a:*  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax     <-- trapping instruction
  30:   77 35                   ja     0x67
  32:   44 89 c7                mov    %r8d,%edi
  35:   89 44 24 0c             mov    %eax,0xc(%rsp)
  39:   e8 91 b9 f7 ff          call   0xfffffffffff7b9cf
  3e:   8b                      .byte 0x8b
  3f:   44                      rex.R

Code starting with the faulting instruction
===========================================
   0:   48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
   6:   77 35                   ja     0x3d
   8:   44 89 c7                mov    %r8d,%edi
   b:   89 44 24 0c             mov    %eax,0xc(%rsp)
   f:   e8 91 b9 f7 ff          call   0xfffffffffff7b9a5
  14:   8b                      .byte 0x8b
  15:   44                      rex.R
[ 1986.874802][T14278] RSP: 002b:00007ffe746730e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 1986.886335][T14278] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f9560fc813b
[ 1986.896355][T14278] RDX: 0000000000000000 RSI: 0000555c95bb40e0 RDI: 0000000000000007
[ 1986.907244][T14278] RBP: 00007ffe74673120 R08: 0000000000000000 R09: 0000000000000000
[ 1986.917247][T14278] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe746735e8
[ 1986.928297][T14278] R13: 0000555c956e82cb R14: 0000555c95743978 R15: 00007f9561122040
[ 1986.939499][T14278]  </TASK>
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN-no-cookie-rejected.pkt (ipv4)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN-no-cookie.pkt (ipv4)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN-no-cookie.pkt (ipv6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN.pkt (ipv4-mapped-v6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-blocking-sendmsg.pkt (ipv4)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-blocking-sendmsg.pkt (ipv6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN-no-cookie.pkt (ipv4-mapped-v6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN.pkt (ipv6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN-no-cookie-rejected.pkt (ipv6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-blocking-sendmsg.pkt (ipv4-mapped-v6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-no-cookie-rejected.pkt (ipv4-mapped-v6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-no-cookie.pkt (ipv4)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT.pkt (ipv4)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_NO_COOKIE.pkt (ipv4-mapped-v6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN.pkt (ipv4)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-no-cookie-rejected.pkt (ipv6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-no-cookie.pkt (ipv6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT.pkt (ipv6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/fastopen-invalid-buf-ptr.pkt (ipv4-mapped-v6)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/server-TCP_FASTOPEN_NO_COOKIE.pkt (ipv4)]
# OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/server-TCP_FASTOPEN_NO_COOKIE.pkt (ipv6)]

Note that the packetdrill tests are executed in parallel. Likely here it is caused by a test initiating the connection.

Source:

Maybe linked to #321

matttbe commented 1 year ago

I just noticed the CI at Tessares found a very similar issue at the same time:

01:03:54.304 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-no-cookie.pkt (ipv6)]
01:03:54.304 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-MSG_FASTOPEN.pkt (ipv4-mapped-v6)]
01:03:54.305 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-no-cookie.pkt (ipv4-mapped-v6)]
01:03:54.305 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT.pkt (ipv4-mapped-v6)]
01:03:54.305 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_NO_COOKIE.pkt (ipv4)]
01:03:54.305 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_NO_COOKIE.pkt (ipv6)]
01:03:54.305 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT-no-cookie.pkt (ipv4)]
01:03:54.306 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT.pkt (ipv6)]
01:03:54.306 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/fastopen-invalid-buf-ptr.pkt (ipv4)]
01:03:54.306 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_CONNECT.pkt (ipv4)]
01:03:54.306 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/fastopen-invalid-buf-ptr.pkt (ipv4-mapped-v6)]
01:03:54.306 [ 1739.789398] 
01:03:54.306 [ 1739.789646] ======================================================
01:03:54.306 [ 1739.790487] WARNING: possible circular locking dependency detected
01:03:54.306 [ 1739.791309] 6.1.0-rc7-ge4c90e57839e #2 Tainted: G                 N
01:03:54.307 [ 1739.791803] ------------------------------------------------------
01:03:54.307 [ 1739.792531] packetdrill/27464 is trying to acquire lock:
01:03:54.573 [ 1739.793063] ffff88800bdd68e0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069) 
01:03:54.573 [ 1739.793973] 
01:03:54.573 [ 1739.793973] but task is already holding lock:
01:03:54.860 [ 1739.794595] ffff888001bfc130 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973) 
01:03:54.860 [ 1739.795383] 
01:03:54.861 [ 1739.795383] which lock already depends on the new lock.
01:03:54.861 [ 1739.795383] 
01:03:54.861 [ 1739.796353] 
01:03:54.861 [ 1739.796353] the existing dependency chain (in reverse order) is:
01:03:54.861 [ 1739.797130] 
01:03:54.861 [ 1739.797130] -> #1 (sk_lock-AF_INET6){+.+.}-{0:0}:
01:03:55.150 [ 1739.797724] lock_acquire (kernel/locking/lockdep.c:466) 
01:03:55.437 [ 1739.798039] lock_sock_nested (net/core/sock.c:3463) 
01:03:55.724 [ 1739.798359] mptcp_worker (net/mptcp/protocol.c:2614) 
01:03:56.012 [ 1739.798658] process_one_work (kernel/workqueue.c:2289) 
01:03:56.308 [ 1739.799007] worker_thread (./include/linux/list.h:292) 
01:03:56.591 [ 1739.799313] kthread (kernel/kthread.c:376) 
01:03:56.878 [ 1739.799610] ret_from_fork (arch/x86/entry/entry_64.S:306) 
01:03:56.878 [ 1739.799921] 
01:03:56.878 [ 1739.799921] -> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
01:03:57.154 [ 1739.800467] check_prev_add (kernel/locking/lockdep.c:3098) 
01:03:57.421 [ 1739.800794] __lock_acquire (kernel/locking/lockdep.c:3217) 
01:03:57.423 [ 1739.801149] lock_acquire (kernel/locking/lockdep.c:466) 
01:03:57.597 [ 1739.801445] __flush_work (kernel/workqueue.c:3070) 
01:03:57.847 [ 1739.801765] __cancel_work_timer (kernel/workqueue.c:3160) 
01:03:58.135 [ 1739.802274] mptcp_cancel_work (net/mptcp/protocol.c:2758) 
01:03:58.421 [ 1739.802731] mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817) 
01:03:58.690 [ 1739.803303] __mptcp_close_ssk (net/mptcp/protocol.c:2363) 
01:03:59.008 [ 1739.803667] mptcp_destroy_common (net/mptcp/protocol.c:3170 (discriminator 3)) 
01:03:59.268 [ 1739.804052] mptcp_destroy (./include/net/sock.h:1495) 
01:03:59.538 [ 1739.804349] __mptcp_destroy_sock (net/mptcp/protocol.c:2884) 
01:03:59.808 [ 1739.804695] __mptcp_close (net/mptcp/protocol.c:2959) 
01:04:00.026 [ 1739.805010] mptcp_close (net/mptcp/protocol.c:2974) 
01:04:00.309 [ 1739.805294] inet_release (net/ipv4/af_inet.c:431) 
01:04:00.578 [ 1739.805587] __sock_release (net/socket.c:650) 
01:04:00.873 [ 1739.805930] sock_close (net/socket.c:1367) 
01:04:01.123 [ 1739.806281] __fput (fs/file_table.c:320) 
01:04:01.414 [ 1739.806546] task_work_run (kernel/task_work.c:179 (discriminator 1)) 
01:04:01.692 [ 1739.806860] exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49) 
01:04:01.985 [ 1739.807250] syscall_exit_to_user_mode (kernel/entry/common.c:130) 
01:04:02.262 [ 1739.807608] do_syscall_64 (arch/x86/entry/common.c:87) 
01:04:02.540 [ 1739.807922] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 
01:04:02.540 [ 1739.808406] 
01:04:02.540 [ 1739.808406] other info that might help us debug this:
01:04:02.540 [ 1739.808406] 
01:04:02.540 [ 1739.808988]  Possible unsafe locking scenario:
01:04:02.540 [ 1739.808988] 
01:04:02.540 [ 1739.809454]        CPU0                    CPU1
01:04:02.540 [ 1739.809786]        ----                    ----
01:04:02.541 [ 1739.810204]   lock(sk_lock-AF_INET6);
01:04:02.541 [ 1739.810467]                                lock((work_completion)(&msk->work));
01:04:02.541 [ 1739.811010]                                lock(sk_lock-AF_INET6);
01:04:02.541 [ 1739.811437]   lock((work_completion)(&msk->work));
01:04:02.541 [ 1739.811774] 
01:04:02.541 [ 1739.811774]  *** DEADLOCK ***
01:04:02.541 [ 1739.811774] 
01:04:02.541 [ 1739.812253] 2 locks held by packetdrill/27464:
01:04:02.716 [ 1739.812576] #0: ffff88800d8e0210 (&sb->s_type->i_mutex_key#6){+.+.}-{3:3}, at: __sock_release (net/socket.c:650) 
01:04:02.718 [ 1739.813264] #1: ffff888001bfc130 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973) 
01:04:02.718 [ 1739.813861] 
01:04:02.718 [ 1739.813861] stack backtrace:
01:04:02.718 [ 1739.814196] CPU: 0 PID: 27464 Comm: packetdrill Tainted: G                 N 6.1.0-rc7-ge4c90e57839e #2
01:04:02.719 [ 1739.814840] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
01:04:02.719 [ 1739.815529] Call Trace:
01:04:02.719 [ 1739.815722]  <TASK>
01:04:02.998 [ 1739.815909] dump_stack_lvl (lib/dump_stack.c:107) 
01:04:03.272 [ 1739.816248] check_noncircular (kernel/locking/lockdep.c:2180) 
01:04:03.562 [ 1739.816588] ? print_circular_bug.isra.0 (kernel/locking/lockdep.c:2156) 
01:04:03.849 [ 1739.817015] ? mark_lock.part.0 (kernel/locking/lockdep.c:4612) 
01:04:04.131 [ 1739.817339] ? kernel_text_address (kernel/extable.c:97) 
01:04:04.403 [ 1739.817658] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26) 
01:04:04.693 [ 1739.817957] ? register_lock_class (kernel/locking/lockdep.c:1381 (discriminator 1)) 
01:04:04.695 [ 1739.818286] check_prev_add (kernel/locking/lockdep.c:3098) 
01:04:05.018 [ 1739.818570] ? stack_trace_save (kernel/stacktrace.c:123) 
01:04:05.020 [ 1739.818864] __lock_acquire (kernel/locking/lockdep.c:3217) 
01:04:05.295 [ 1739.819171] ? alloc_list_entry (./arch/x86/include/asm/bitops.h:68) 
01:04:05.470 [ 1739.819461] ? register_lock_class (kernel/locking/lockdep.c:4913) 
01:04:05.759 [ 1739.819778] ? rcu_read_lock_sched_held (./include/linux/lockdep.h:283) 
01:04:05.761 [ 1739.820213] lock_acquire (kernel/locking/lockdep.c:466) 
01:04:05.763 [ 1739.820483] ? __flush_work (kernel/workqueue.c:3069) 
01:04:06.046 [ 1739.820762] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:5636) 
01:04:06.048 [ 1739.821136] ? mark_lock.part.0 (kernel/locking/lockdep.c:4612) 
01:04:06.050 [ 1739.821450] ? register_lock_class (kernel/locking/lockdep.c:4913) 
01:04:06.325 [ 1739.821775] ? check_chain_key (kernel/locking/lockdep.c:3894) 
01:04:06.326 [ 1739.822087] __flush_work (kernel/workqueue.c:3070) 
01:04:06.328 [ 1739.822345] ? __flush_work (kernel/workqueue.c:3069) 
01:04:06.617 [ 1739.822614] ? queue_delayed_work_on (kernel/workqueue.c:3060) 
01:04:06.618 [ 1739.822958] ? register_lock_class (kernel/locking/lockdep.c:4913) 
01:04:06.902 [ 1739.823337] ? mark_held_locks (kernel/locking/lockdep.c:4236) 
01:04:06.904 [ 1739.823618] ? mark_lock.part.0 (kernel/locking/lockdep.c:4612) 
01:04:06.906 [ 1739.823982] ? mark_held_locks (kernel/locking/lockdep.c:4236) 
01:04:06.908 [ 1739.824270] __cancel_work_timer (kernel/workqueue.c:3160) 
01:04:07.181 [ 1739.824587] ? cancel_delayed_work (kernel/workqueue.c:3113) 
01:04:07.358 [ 1739.824905] ? mptcp_subflow_queue_clean (net/mptcp/subflow.c:1816) 
01:04:07.360 [ 1739.825268] ? mark_held_locks (kernel/locking/lockdep.c:4236) 
01:04:07.535 [ 1739.825546] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:466) 
01:04:07.537 [ 1739.825946] mptcp_cancel_work (net/mptcp/protocol.c:2758) 
01:04:07.539 [ 1739.826295] mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817) 
01:04:07.541 [ 1739.826635] __mptcp_close_ssk (net/mptcp/protocol.c:2363) 
01:04:07.543 [ 1739.826993] mptcp_destroy_common (net/mptcp/protocol.c:3170 (discriminator 3)) 
01:04:07.545 [ 1739.827299] mptcp_destroy (./include/net/sock.h:1495) 
01:04:07.547 [ 1739.827557] __mptcp_destroy_sock (net/mptcp/protocol.c:2884) 
01:04:07.549 [ 1739.827861] __mptcp_close (net/mptcp/protocol.c:2959) 
01:04:07.551 [ 1739.828248] mptcp_close (net/mptcp/protocol.c:2974) 
01:04:07.552 [ 1739.828496] inet_release (net/ipv4/af_inet.c:431) 
01:04:07.554 [ 1739.828796] __sock_release (net/socket.c:650) 
01:04:07.556 [ 1739.829086] sock_close (net/socket.c:1367) 
01:04:07.558 [ 1739.829325] __fput (fs/file_table.c:320) 
01:04:07.559 [ 1739.829559] task_work_run (kernel/task_work.c:179 (discriminator 1)) 
01:04:07.851 [ 1739.829821] ? task_work_cancel (kernel/task_work.c:147) 
01:04:07.853 [ 1739.830231] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:466) 
01:04:07.855 [ 1739.830649] exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49) 
01:04:07.857 [ 1739.831079] syscall_exit_to_user_mode (kernel/entry/common.c:130) 
01:04:07.859 [ 1739.831416] do_syscall_64 (arch/x86/entry/common.c:87) 
01:04:07.860 [ 1739.831680] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 
01:04:07.860 [ 1739.832094] RIP: 0033:0x7f03a965f45b
01:04:07.864 [ 1739.832363] Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 f3 fb ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 0c e8 31 fc ff ff 8b 44
01:04:07.976 All code
01:04:07.976 ========
01:04:07.976    0:  03 00                   add    (%rax),%eax
01:04:07.976    2:  00 00                   add    %al,(%rax)
01:04:07.976    4:  0f 05                   syscall 
01:04:07.976    6:  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
01:04:07.976    c:  77 41                   ja     0x4f
01:04:07.976    e:  c3                      retq   
01:04:07.977    f:  48 83 ec 18             sub    $0x18,%rsp
01:04:07.977   13:  89 7c 24 0c             mov    %edi,0xc(%rsp)
01:04:07.977   17:  e8 f3 fb ff ff          callq  0xfffffffffffffc0f
01:04:07.977   1c:  8b 7c 24 0c             mov    0xc(%rsp),%edi
01:04:07.977   20:  41 89 c0                mov    %eax,%r8d
01:04:07.977   23:  b8 03 00 00 00          mov    $0x3,%eax
01:04:07.977   28:  0f 05                   syscall 
01:04:07.977   2a:* 48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax     <-- trapping instruction
01:04:07.977   30:  77 2f                   ja     0x61
01:04:07.977   32:  44 89 c7                mov    %r8d,%edi
01:04:07.977   35:  89 44 24 0c             mov    %eax,0xc(%rsp)
01:04:07.977   39:  e8 31 fc ff ff          callq  0xfffffffffffffc6f
01:04:07.977   3e:  8b                      .byte 0x8b
01:04:07.977   3f:  44                      rex.R
01:04:07.977 
01:04:07.977 Code starting with the faulting instruction
01:04:07.977 ===========================================
01:04:07.977    0:  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
01:04:07.977    6:  77 2f                   ja     0x37
01:04:07.977    8:  44 89 c7                mov    %r8d,%edi
01:04:07.977    b:  89 44 24 0c             mov    %eax,0xc(%rsp)
01:04:07.977    f:  e8 31 fc ff ff          callq  0xfffffffffffffc45
01:04:07.977   14:  8b                      .byte 0x8b
01:04:07.977   15:  44                      rex.R
01:04:07.978 [ 1739.833645] RSP: 002b:00007ffe3e7105e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
01:04:07.979 [ 1739.834268] RAX: 0000000000000000 RBX: 0000557d35342e30 RCX: 00007f03a965f45b
01:04:07.979 [ 1739.834805] RDX: 0000557d3649a0f0 RSI: 0000557d3649a0f0 RDI: 0000000000000007
01:04:07.979 [ 1739.835315] RBP: 00007ffe3e710620 R08: 0000000000000000 R09: 0000000000000000
01:04:07.979 [ 1739.835804] R10: 0000000000000000 R11: 0000000000000293 R12: 0000557d353040e0
01:04:07.979 [ 1739.836395] R13: 00007ffe3e710ac0 R14: 0000000000000000 R15: 0000000000000000
01:04:07.979 [ 1739.836920]  </TASK>
01:04:07.979 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/client-TCP_FASTOPEN_NO_COOKIE.pkt (ipv4-mapped-v6)]
01:04:07.980 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/server-TCP_FASTOPEN_NO_COOKIE.pkt (ipv4)]
01:04:07.980 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/server-TCP_FASTOPEN_NO_COOKIE.pkt (ipv6)]
01:04:07.980 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/server-tfo-rst-before-accept.pkt (ipv4-mapped-v6)]
01:04:07.980 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/fastopen-invalid-buf-ptr.pkt (ipv6)]
01:04:07.980 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/server-tfo-rst-before-accept.pkt (ipv4)]
01:04:07.980 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/server-TCP_FASTOPEN_NO_COOKIE.pkt (ipv4-mapped-v6)]
01:04:07.980 # OK   [/opt/packetdrill/gtests/net/mptcp/fastopen/server-tfo-rst-before-accept.pkt (ipv6)]
01:04:07.980 # Ran   33 tests:   33 passing,    0 failing,    0 timed out (18.61 sec): mptcp/fastopen
pabeni commented 1 year ago

this is the same as issue #321 : closing as dup the latter, as this contains more info (lockdep)

pabeni commented 1 year ago

with equally low level of coffee I would say this is a false positive. mptcp_subflow_queue_clean() owns the listener's msk socket lock and calls cancel_work_sync() on an enqueued msk. Not sure how to tell it to lockdep. Perhaps changing the msk socket lock key while in the receive queue?!?

matttbe commented 1 year ago

@pabeni thank you for having looked at that!

I was wondering why we started to see these warnings 3 days ago and it looks like that's because 3 days ago, we merged the PR adding TFO packetdrill tests.

What's funny is that if I remove the reproducer for #316 , I don't have the issue and if I only run this one, I don't have the issue either :-)

I can confirm the bug is not new. It is visible with the export branch from the 23rd of November, when b4f166651d03 ("mptcp: fix sleep in atomic at close time") has been applied. I cannot check without because either I have the bug from #316 or I can no longer reproduce the issue.

I don't know what's the best solution to tell lockdep everything is OK.

Not to have the failure in all commits we validate, I can revert https://github.com/multipath-tcp/packetdrill/pull/103 for the moment.

matttbe commented 1 year ago

Not to have the failure in all commits we validate, I can revert https://github.com/multipath-tcp/packetdrill/pull/103 for the moment.

I just did that: https://github.com/multipath-tcp/packetdrill/pull/109

pabeni commented 1 year ago

@matttbe: could you please test the following workaround?

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 6d03bdcda33e..289dd4add88c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2363,7 +2363,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
        /* otherwise tcp will dispose of the ssk and subflow ctx */
        if (ssk->sk_state == TCP_LISTEN) {
            tcp_set_state(ssk, TCP_CLOSE);
-           mptcp_subflow_queue_clean(ssk);
+           mptcp_subflow_queue_clean(sk, ssk);
            inet_csk_listen_stop(ssk);
            mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CLOSED);
        }
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a9ff7028fad8..3cd3270407b0 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -630,7 +630,7 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
             struct mptcp_subflow_context *subflow);
 void __mptcp_subflow_send_ack(struct sock *ssk);
 void mptcp_subflow_reset(struct sock *ssk);
-void mptcp_subflow_queue_clean(struct sock *ssk);
+void mptcp_subflow_queue_clean(struct sock *sk, struct sock *ssk);
 void mptcp_sock_graft(struct sock *sk, struct socket *parent);
 struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk);
 bool __mptcp_close(struct sock *sk, long timeout);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 9341f9313154..2f7830c65d95 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1778,7 +1778,7 @@ static void subflow_state_change(struct sock *sk)
    }
 }

-void mptcp_subflow_queue_clean(struct sock *listener_ssk)
+void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk)
 {
    struct request_sock_queue *queue = &inet_csk(listener_ssk)->icsk_accept_queue;
    struct mptcp_sock *msk, *next, *head = NULL;
@@ -1827,8 +1827,18 @@ void mptcp_subflow_queue_clean(struct sock *listener_ssk)

        do_cancel_work = __mptcp_close(sk, 0);
        release_sock(sk);
-       if (do_cancel_work)
+       if (do_cancel_work) {
+           /* lockdep will report a false positive ABBA deadlock between
+            * cancel_work_sync and the listener socket. The involved locks
+            * belong to differen sockets WRT the existing AB chain.
+            * Using a per socket key would be very invasive and heavy, just
+            * tell lockdep to consider the listener socket released here
+            */
+           mutex_release(&listener_sk->sk_lock.dep_map, _RET_IP_);
            mptcp_cancel_work(sk);
+           mutex_acquire(&listener_sk->sk_lock.dep_map,
+                     SINGLE_DEPTH_NESTING, 0, _RET_IP_);
+       }
        sock_put(sk);
    }

also available on pastebin: https://paste.centos.org/view/dff51e32

matttbe commented 1 year ago

@pabeni: thank you for having looked at this so quickly!

That looks like a cleaver idea!

And it seems to work very well: I cannot reproduce the issue any more with this patch!

Should we release/acquire the mutex only with #ifdef CONFIG_LOCKDEP?