multipath-tcp / mptcp_net-next

Development version of the Upstream MultiPath TCP Linux kernel 🐧
https://mptcp.dev
Other
284 stars 42 forks source link

WARNING in sk_stream_kill_queues and inet_sock_destruct #195

Closed matttbe closed 3 years ago

matttbe commented 3 years ago

Similar to #172, I had this, without perf, with @pabeni's patches:

++ ./mptcp_connect.sh -l 0.99% -d 4 -r '94% 99%'
[ 2471.700771] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 2472.071206] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[ 2472.531101] IPv6: ADDRCONF(NETDEV_CHANGE): ns4eth3: link becomes ready
[ 2472.536847] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
# INFO: set ns3-60a4fa56-WfeBHV dev ns3eth2: ethtool -K  gro off
# INFO: set ns4-60a4fa56-WfeBHV dev ns4eth3: ethtool -K tso off gso off gro off
# Created /tmp/tmp.Ycdt9WOL5n (size 7319580     /tmp/tmp.Ycdt9WOL5n) containing data sent by client
# Created /tmp/tmp.ygogCWfJ1o (size 7500828     /tmp/tmp.ygogCWfJ1o) containing data sent by server
# New MPTCP socket can be blocked via sysctl            [ OK ]
# setsockopt(..., TCP_ULP, "mptcp", ...) blocked        [ OK ]
# INFO: validating network environment with pings
# INFO: Using loss of 0.99% delay 4 ms reorder 94% 99% with delay 1ms on ns3eth4
# ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP (duration   175ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP   (duration   140ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP (duration   142ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration   153ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP   (duration   118ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP (duration   160ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP (duration   135ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP (duration   139ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP (duration   143ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP (duration   142ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP (duration   276ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP (duration   182ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP (duration   172ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP (duration   356ms) [ OK ]
# ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP (duration   254ms) [ OK ]
# ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP (duration   444ms) [ OK ]
# ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP (duration   119ms) [ OK ]
# ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP (duration   133ms) [ OK ]
# ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP (duration   191ms) [ OK ]
# ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP [ 2501.608391] ------------[ cut here ]------------
[ 2501.613071] WARNING: CPU: 1 PID: 29458 at net/core/stream.c:208 sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1))
[ 2501.621115] Modules linked in: sch_netem
[ 2501.624986] CPU: 1 PID: 29458 Comm: kworker/1:3 Not tainted 5.12.0+ #538
[ 2501.630837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 2501.638693] Workqueue: events mptcp_worker
[ 2501.642367] RIP: 0010:sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1))
[ 2501.646966] Code: 00 85 c0 75 1f 85 f6 75 21 5b 5d c3 48 89 df e8 83 ec fe ff 8b 83 48 01 00 00 8b b3 00 01 00 00 85 c0 74 e1 0f 0b 85 f6 74 df <0f> 0b 5b 5d c3 0f 0b eb ac 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56
All code
========
   0:   00 85 c0 75 1f 85       add    %al,-0x7ae08a40(%rbp)
   6:   f6 75 21                divb   0x21(%rbp)
   9:   5b                      pop    %rbx
   a:   5d                      pop    %rbp
   b:   c3                      retq
   c:   48 89 df                mov    %rbx,%rdi
   f:   e8 83 ec fe ff          callq  0xfffffffffffeec97
  14:   8b 83 48 01 00 00       mov    0x148(%rbx),%eax
  1a:   8b b3 00 01 00 00       mov    0x100(%rbx),%esi
  20:   85 c0                   test   %eax,%eax
  22:   74 e1                   je     0x5
  24:   0f 0b                   ud2
  26:   85 f6                   test   %esi,%esi
  28:   74 df                   je     0x9
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   5b                      pop    %rbx
  2d:   5d                      pop    %rbp
  2e:   c3                      retq
  2f:   0f 0b                   ud2
  31:   eb ac                   jmp    0xffffffffffffffdf
  33:   0f 1f 40 00             nopl   0x0(%rax)
  37:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  3c:   41 57                   push   %r15
  3e:   41 56                   push   %r14

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   5b                      pop    %rbx
   3:   5d                      pop    %rbp
   4:   c3                      retq
   5:   0f 0b                   ud2
   7:   eb ac                   jmp    0xffffffffffffffb5
   9:   0f 1f 40 00             nopl   0x0(%rax)
   d:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  12:   41 57                   push   %r15
  14:   41 56                   push   %r14
[ 2501.662908] RSP: 0018:ffff97504353bdf8 EFLAGS: 00010206
[ 2501.667526] RAX: 0000000000000000 RBX: ffff8ec3c35b64c0 RCX: 0000000000000007
[ 2501.673705] RDX: ffffffffa4ef8570 RSI: 0000000000000dbb RDI: ffff8ec3c35b64c0
[ 2501.679935] RBP: ffff8ec3c35b6570 R08: 0000000000000000 R09: 0000000000000001
[ 2501.686117] R10: 000000000000000c R11: 0000000000000000 R12: ffff97504353be10
[ 2501.692257] R13: ffff8ec3c35b6a90 R14: ffff8ec43dd2d100 R15: 0000000000000000
[ 2501.698466] FS:  0000000000000000(0000) GS:ffff8ec43dd00000(0000) knlGS:0000000000000000
[ 2501.705498] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2501.710512] CR2: 00007fffcd37aab8 CR3: 000000003c010000 CR4: 00000000003506e0
[ 2501.716750] Call Trace:
[ 2501.719024] __mptcp_destroy_sock (include/net/xfrm.h:1185 (discriminator 3))
[ 2501.722929] mptcp_worker (net/mptcp/protocol.c:2370)
[ 2501.726269] process_one_work (kernel/workqueue.c:2275)
[ 2501.729872] worker_thread (include/linux/list.h:282)
[ 2501.733147] ? process_one_work (kernel/workqueue.c:2364)
[ 2501.736909] kthread (kernel/kthread.c:313)
[ 2501.739739] ? kthread_park (kernel/kthread.c:266)
[ 2501.743020] ret_from_fork (arch/x86/entry/entry_64.S:294)
[ 2501.746257] ---[ end trace d0ac4fdcacad2849 ]---
[ 2501.750413] ------------[ cut here ]------------
[ 2501.754621] WARNING: CPU: 1 PID: 16 at net/ipv4/af_inet.c:157 inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1))
[ 2501.761994] Modules linked in: sch_netem
[ 2501.765762] CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G        W         5.12.0+ #538
[ 2501.772469] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 2501.780272] RIP: 0010:inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1))
[ 2501.784657] Code: bc 24 30 01 00 00 5b 41 5c e9 c7 e4 f2 ff 41 0f b6 44 24 12 3c 07 74 8f e9 26 fb 21 00 4c 89 e7 e8 10 4d f0 ff e9 71 ff ff ff <0f> 0b eb b6 0f 0b 41 8b 84 24 4c 01 00 00 85 c0 74 90 0f 0b 41 8b
All code
========
   0:   bc 24 30 01 00          mov    $0x13024,%esp
   5:   00 5b 41                add    %bl,0x41(%rbx)
   8:   5c                      pop    %rsp
   9:   e9 c7 e4 f2 ff          jmpq   0xfffffffffff2e4d5
   e:   41 0f b6 44 24 12       movzbl 0x12(%r12),%eax
  14:   3c 07                   cmp    $0x7,%al
  16:   74 8f                   je     0xffffffffffffffa7
  18:   e9 26 fb 21 00          jmpq   0x21fb43
  1d:   4c 89 e7                mov    %r12,%rdi
  20:   e8 10 4d f0 ff          callq  0xfffffffffff04d35
  25:   e9 71 ff ff ff          jmpq   0xffffffffffffff9b
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   eb b6                   jmp    0xffffffffffffffe4
  2e:   0f 0b                   ud2
  30:   41 8b 84 24 4c 01 00    mov    0x14c(%r12),%eax
  37:   00
  38:   85 c0                   test   %eax,%eax
  3a:   74 90                   je     0xffffffffffffffcc
  3c:   0f 0b                   ud2
  3e:   41                      rex.B
  3f:   8b                      .byte 0x8b

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   eb b6                   jmp    0xffffffffffffffba
   4:   0f 0b                   ud2
   6:   41 8b 84 24 4c 01 00    mov    0x14c(%r12),%eax
   d:   00
   e:   85 c0                   test   %eax,%eax
  10:   74 90                   je     0xffffffffffffffa2
  12:   0f 0b                   ud2
  14:   41                      rex.B
  15:   8b                      .byte 0x8b
[ 2501.800279] RSP: 0018:ffff975040093a98 EFLAGS: 00010206
[ 2501.804968] RAX: 0000000000000dbb RBX: ffff8ec3c35b6570 RCX: 0000000000000ba4
[ 2501.811048] RDX: 0000000000000000 RSI: 0000000000000dbb RDI: ffff8ec3c35b6570
[ 2501.817247] RBP: ffff8ec3c35b64c0 R08: ffff8ec3c35b660c R09: ffffffffa4053a00
[ 2501.823364] R10: ffff8ec3c2c80080 R11: 0000000000000001 R12: ffff8ec3c35b64c0
[ 2501.829507] R13: ffff8ec3c46b6cb8 R14: ffff8ec3c46b6cb8 R15: ffff8ec3c46b6c90
[ 2501.835655] FS:  0000000000000000(0000) GS:ffff8ec43dd00000(0000) knlGS:0000000000000000
[ 2501.842610] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2501.847691] CR2: 00007fffcd37aab8 CR3: 000000003c010000 CR4: 00000000003506e0
[ 2501.853857] Call Trace:
[ 2501.856295] __sk_destruct (net/core/sock.c:1808)
[ 2501.859621] subflow_ulp_release (include/net/sock.h:1813)
[ 2501.863278] tcp_cleanup_ulp (net/ipv4/tcp_ulp.c:124)
[ 2501.866718] tcp_v4_destroy_sock (net/ipv4/tcp_ipv4.c:2237)
[ 2501.870491] tcp_v6_destroy_sock (net/ipv6/tcp_ipv6.c:1947)
[ 2501.874078] inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:887 (discriminator 9))
[ 2501.878122] tcp_reset (arch/x86/include/asm/bitops.h:207)
[ 2501.881144] tcp_validate_incoming (net/ipv4/tcp_input.c:5656)
[ 2501.885100] tcp_rcv_state_process (net/ipv4/tcp_input.c:6384)
[ 2501.889137] ? security_sock_rcv_skb (security/security.c:2241 (discriminator 19))
[ 2501.893147] ? sk_filter_trim_cap (net/core/filter.c:156)
[ 2501.897000] tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1527)
[ 2501.900387] tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1736)
[ 2501.903545] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:424 (discriminator 8))
[ 2501.907668] ip6_input_finish (include/linux/rcupdate.h:74)
[ 2501.911120] ip6_input (include/linux/netfilter.h:301)
[ 2501.914191] ? ip6_dst_check (net/ipv6/route.c:2625)
[ 2501.917620] ? tcp_v6_early_demux (include/net/dst.h:469)
[ 2501.921554] ipv6_rcv (include/linux/netfilter.h:302)
[ 2501.924479] __netif_receive_skb_one_core (net/core/dev.c:5443)
[ 2501.928817] process_backlog (include/linux/rcupdate.h:74)
[ 2501.932295] __napi_poll (net/core/dev.c:6973)
[ 2501.935462] net_rx_action (net/core/dev.c:7042)
[ 2501.938842] __do_softirq (kernel/softirq.c:559)
[ 2501.943153] run_ksoftirqd (arch/x86/include/asm/irqflags.h:45)
[ 2501.946383] smpboot_thread_fn (kernel/smpboot.c:165)
[ 2501.949982] ? sort_range (kernel/smpboot.c:108)
[ 2501.953131] kthread (kernel/kthread.c:313)
[ 2501.955991] ? kthread_park (kernel/kthread.c:266)
[ 2501.959268] ret_from_fork (arch/x86/entry/entry_64.S:294)
[ 2501.962471] ---[ end trace d0ac4fdcacad284a ]---
(duration   529ms) [ OK ]
# ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP (duration   183ms) [ OK ]
# ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP (duration   160ms) [ OK ]
# ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP (duration   212ms) [ OK ]
# ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP (duration   270ms) [ OK ]
# ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP (duration   406ms) [ OK ]
# ns3 MPTCP -> ns1 (dead:beef:1::1:10025) MPTCP (duration   191ms) [ OK ]
# ns3 MPTCP -> ns2 (10.0.1.2:10026      ) MPTCP (duration   220ms) [ OK ]
# ns3 MPTCP -> ns2 (dead:beef:1::2:10027) MPTCP (duration  1551ms) [ OK ]
# ns3 MPTCP -> ns2 (10.0.2.1:10028      ) MPTCP (duration   313ms) [ OK ]
# ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP (duration   187ms) [ OK ]
# ns3 MPTCP -> ns4 (10.0.3.1:10030      ) MPTCP (duration   139ms) [ OK ]
# ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP (duration   178ms) [ OK ]
# ns4 MPTCP -> ns1 (10.0.1.1:10032      ) MPTCP (duration   189ms) [ OK ]
# ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP (duration   214ms) [ OK ]
# ns4 MPTCP -> ns2 (10.0.1.2:10034      ) MPTCP (duration   519ms) [ OK ]
# ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP (duration   212ms) [ OK ]
# ns4 MPTCP -> ns2 (10.0.2.1:10036      ) MPTCP (duration   324ms) [ OK ]
# ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP (duration   946ms) [ OK ]
# ns4 MPTCP -> ns3 (10.0.2.2:10038      ) MPTCP (duration   145ms) [ OK ]
# ns4 MPTCP -> ns3 (dead:beef:2::2:10039) MPTCP (duration   165ms) [ OK ]
# ns4 MPTCP -> ns3 (10.0.3.2:10040      ) MPTCP (duration   188ms) [ OK ]
# ns4 MPTCP -> ns3 (dead:beef:3::2:10041) MPTCP (duration   152ms) [ OK ]
# INFO: with peek mode: saveWithPeek
# ns1 MPTCP -> ns1 (10.0.1.1:10042      ) MPTCP (duration   170ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10043      ) TCP   (duration   118ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10044      ) MPTCP (duration   116ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10045) MPTCP (duration   186ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10046) TCP   (duration   144ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10047) MPTCP (duration   114ms) [ OK ]
# INFO: with peek mode: saveAfterPeek
# ns1 MPTCP -> ns1 (10.0.1.1:10048      ) MPTCP (duration   180ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10049      ) TCP   (duration   151ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10050      ) MPTCP (duration   123ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10051) MPTCP (duration   167ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10052) TCP   (duration   148ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10053) MPTCP (duration   133ms) [ OK ]
# Time: 77 seconds
++ rc=0

On top of yesterday's export branch with these 4 patches:

878282030ff0 mptcp: drop skb cache
66f500c3e329 mptcp: update selftest for fallback due to OoO
2eda92d4479d mptcp: do not reset MP_CAPABLE subflow on mapping errors
e03f1c973fc8 mptcp: always parse mptcp options for MPC reqsk

_Originally posted in https://github.com/multipath-tcp/mptcp_net-next/issues/172#issuecomment-844254432_

Also happening without 878282030ff0 mptcp: drop skb cache, see #172

matttbe commented 3 years ago

Should be fixed thanks to @pabeni 's patch: