multipath-tcp / mptcp

⚠️⚠️⚠️ Deprecated 🚫 Out-of-tree Linux Kernel implementation of MultiPath TCP. 👉 Use https://github.com/multipath-tcp/mptcp_net-next repo instead ⚠️⚠️⚠️
https://github.com/multipath-tcp/mptcp_net-next
Other
890 stars 335 forks source link

[mptcp v0.91.3] RIP [<ffffffff81716cfe>] inet_csk_bind_conflict+0x7e/0x150 #179

Closed ssimmen closed 7 years ago

ssimmen commented 7 years ago

Hi again

We experience occasionally Kernel crashes on our Linux virtual machine running mptcp v0.91.3.

The VM where the crashes occurred is used in a setup where it acts as a SOCKS proxy. In order to disable MPTCP on the external side of the proxy, we set net.mptcp.mptcp_enabled = 2. The proxy application enables MPTCP only on the internal interface like follows:

      if (setsockopt(l->s, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)) != 0)
         swarn("%s: setsockopt(SO_REUSEADDR)", function);

      int enable = 1;
      if (setsockopt(l->s, SOL_TCP, MPTCP_ENABLED, &enable, sizeof(enable)) != 0)
         swarn("%s: setsockopt(MPTCP_ENABLED)", function);

      if (listen(l->s, SOCKD_MAXCLIENTQUEUE) == -1) {
         swarn("%s: listen(%d) failed", function, SOCKD_MAXCLIENTQUEUE);
         return -1;
      }

We think the crash occurs when the proxy application tries to establish a external connection with bind(); connect(). Unfortunately we are unable to reproduce the issue, so we don't know exactly what situation can trigger this bug.

We did an strace of the proxy application's processes:

strace of listening pid

setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(8,
{sa_family=AF_INET, sin_port=htons(1080), sin_addr=inet_addr("0.0.0.0")}
, 16) = 0
setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(8, SOL_TCP, 0x2a /* TCP_??? */, [1], 4) = 0
listen(8, 511) = 0

strace of connecting pid

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 11
bind(11,
{sa_family=AF_INET, sin_port=htons(27749), sin_addr=inet_addr("10.0.0.21")}, 16) = 0
getsockname(11, {sa_family=AF_INET, sin_port=htons(27749), sin_addr=inet_addr("10.0.0.21")}
, [16]) = 0
getsockname(10,
{sa_family=AF_INET, sin_port=htons(1080), sin_addr=inet_addr("10.0.3.21")}
, [16]) = 0
setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(11, SOL_SOCKET, SO_OOBINLINE, [1], 4) = 0
setsockopt(11, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
fcntl(11, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(11, F_SETFL, O_RDWR|O_NONBLOCK) = 0
getsockname(11,
{sa_family=AF_INET, sin_port=htons(27749), sin_addr=inet_addr("10.0.0.21")}, [16]) = 0
brk(0x3587000) = 0x3587000
getsockname(11, {sa_family=AF_INET, sin_port=htons(27749), sin_addr=inet_addr("10.0.0.21")}
, [16]) = 0
fcntl(11, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
connect(11,
{sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.0.0.200")}
, 16) = -1 EINPROGRESS (Operation now in progress)
sendmsg(7, {msg_name(0)=NULL, msg_iov(1)=[
{"\0\1\0\0\0\0\0\0\1\0\0\0\1\0\0\0\5\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0"..., 36160}
], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {10, 11}}, msg_flags=0}, 0) = 36160
}}

VM Setup

Our MPTCP Kernel was compiled based on this release: https://github.com/multipath-tcp/mptcp/commit/a40a47b9a03d13609c415ed1599c46b03dfb5744

Sysctl config

fs.file-max = 512000
vm.swappiness = 10
net.ipv4.ip_local_port_range = 2000 65000
net.core.somaxconn = 511
net.ipv4.tcp_fin_timeout = 10
net.ipv4.conf.all.log_martians = 1
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.mptcp.mptcp_checksum = 1
net.mptcp.mptcp_debug = 0
net.mptcp.mptcp_enabled = 2
net.mptcp.mptcp_path_manager = default
net.mptcp.mptcp_scheduler = default
net.mptcp.mptcp_syn_retries = 0
net.mptcp.mptcp_version = 0

Kernel Crash log

[983025.036379] Modules linked in: tcp_diag(E) inet_diag(E) ip6table_filter(E) ip6_tables(E) xt_tcpudp(E) xt_comment(E) xt_multiport
(E) iptable_filter(E) ip_tables(E) xt_set(E) ip_set(E) nfnetlink(E) x_tables(E) coretemp(E) crct10dif_pclmul(E) crc32_pclmul(E) aesn
i_intel(E) ppdev(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) vmw_balloon(E) joydev(E) serio_raw(E) v
mwgfx(E) ttm(E) drm_kms_helper(E) 8250_fintek(E) shpchp(E) drm(E) i2c_piix4(E) parport_pc(E) vmw_vmci(E) mac_hid(E) lp(E) parport(E)
 psmouse(E) mptspi(E) mptscsih(E) mptbase(E) vmxnet3(E) scsi_transport_spi(E) pata_acpi(E) floppy(E)
[983025.042667] CPU: 1 PID: 10601 Comm: sockd Tainted: G            E   4.1.38-mptcp #1
[983025.043678] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
[983025.045441] task: ffff8809cc6c9420 ti: ffff88099eb74000 task.ti: ffff88099eb74000
[983025.046318] RIP: 0010:[<ffffffff81716cfe>]  [<ffffffff81716cfe>] inet_csk_bind_conflict+0x7e/0x150
[983025.048097] RSP: 0018:ffff88099eb77d68  EFLAGS: 00010292
[983025.048971] RAX: 00000000b88d40c5 RBX: 184543afe3a19dab RCX: 0000000084ddca3e
[983025.049814] RDX: 0000000000000000 RSI: 00000000fffffe01 RDI: ffffffff816b63c7
[983025.050689] RBP: ffff88099eb77da8 R08: ffff8800bb293380 R09: ffff880a0bdc99c0
[983025.051544] R10: 00000000fe08409c R11: 0000000000000001 R12: ffff8808fc75b9c0
[983025.052360] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[983025.053160] FS:  00007fd2451df740(0000) GS:ffff880a3fc40000(0000) knlGS:0000000000000000
[983025.054013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[983025.054795] CR2: 00007fd2451e7018 CR3: 0000000a0d3e4000 CR4: 00000000000006e0
[983025.055693] Stack:
[983025.056466]  ffffffff81abfac0 000003e7fc75bb68 ffff88099eb77d88 ffff8808fc75b9c0
[983025.057264]  ffffffff81ee9c00 000000000000862c 000000000000862c ffffc900066b69c0
[983025.058018]  ffff88099eb77e28 ffffffff817171b6 ffff88099eb77e90 000000059eb77e90
[983025.058777] Call Trace:
[983025.059508]  [<ffffffff817171b6>] inet_csk_get_port+0x3e6/0x570
[983025.060234]  [<ffffffff817506ce>] ? inet_addr_type+0x7e/0x90
[983025.060975]  [<ffffffff817496bc>] inet_bind+0x14c/0x200
[983025.061685]  [<ffffffff81327326>] ? security_file_alloc+0x16/0x20
[983025.062450]  [<ffffffff816b3e40>] SYSC_bind+0xe0/0x120
[983025.063168]  [<ffffffff816b1161>] ? sock_alloc_file+0x91/0x120
[983025.063850]  [<ffffffff81219e8e>] ? __fd_install+0x4e/0x60
[983025.064561]  [<ffffffff81219ec5>] ? fd_install+0x25/0x30
[983025.065187]  [<ffffffff816b4890>] ? SyS_socket+0x90/0xc0
[983025.065819]  [<ffffffff816b4aee>] SyS_bind+0xe/0x10
[983025.066460]  [<ffffffff817f8f72>] system_call_fastpath+0x16/0x75
[983025.067115] Code: 0f 1f 44 00 00 8b 43 14 85 c0 74 2e 39 c1 74 2a 0f 1f 44 00 00 48 8b 5b 18 48 85 db 0f 84 bb 00 00 00 48 83 eb 18 49 39 dc 74 ea <f6> 43 13 20 75 e4 41 8b 4c 24 14 85 c9 75 cb 45 85 f6 74 56 f6
[983025.069047] RIP  [<ffffffff81716cfe>] inet_csk_bind_conflict+0x7e/0x150
[983025.069666]  RSP <ffff88099eb77d68>
cpaasch commented 7 years ago

Thanks for this detailed report. There seems to be a memory-corruption going on. By any chance, do you have the coredump of this crash?

ssimmen commented 7 years ago

Hi Christoph Sorry about the delay. Unfortunately we currently cannot provide you the dump, but we followed these instructions to gather further information.

I attached you an archive kernel_crash.zip containing the following files of a newer crash:

If we can help you somehow in addition, please let us know. Thanks for your help


kernel_crash.zip

cpaasch commented 7 years ago

@ssimmen: Is your user-space program using IPV6_ADDRFORM ?

matttbe commented 7 years ago

Closing: fix now in mptcp_trunk, mptcp_v0.93 and mptcp_v0.92.

Please re-open it if the referenced commit does not fix this issue!