Closed knxhm closed 5 years ago
Hi Franz,
Thank you for this bug report.
By chance, are you able to reproduce this warning with the latest mptcp_v0.94 based on the kernel v4.14.94 with extra fixes?
Thanks Matt, I will do that but it may take some (longer) time, it is a production system and the messages appear in irregular intervals.
I saw this error on production systems too. Usually a reboot follows.
I have a 4.14.79 kernel with 6ece8f4d (Correctly initialize IPv6 fields) as the latest commit. I'll try to retest with that one when possible.
Kostas
On Tue, 12 Feb 2019, 10:15 knxhm <notifications@github.com wrote:
Thanks Matt, I will do that but it may take some (longer) time, it is a production system and the messages appear in irregular intervals.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/multipath-tcp/mptcp/issues/315#issuecomment-462700453, or mute the thread https://github.com/notifications/unsubscribe-auth/ADgmo7vI6-Hc9cDp2aQ811fiMj4biRRDks5vMpRSgaJpZM4a06i9 .
I doubt that latest v0.94 has a fix for this scenario. We might need more debug-info. What seems to be happening is that after a fallback we still have a lingering subflow whose timer is expiring. Might be worth it to not only print the meta-socket's state but also more info about subflows. That would help us narrow the issue down.
Christoph, would you want to prepare a debug patch on top of the latest 4.14.94+ to get more debug-info ?
I have also seen this issue with a very recent v0.94 kernel (Linux-4.14.105 plus MPTCP commits up to and including 3f01458).
@cpaasch are there any particular fields of struct mptcp_tcp_sock that you would like to see traced in the warning? Will it be too slow to walk the linked list of mptcp_tcp_sock elements and trace a few fields from each?
You can apply the below patch:
I found two more cases of mptcp_sock_def_error_report with syzkaller that I am currently trying to root-cause. Might be the same issue.
From 7f8c657d5d003dce31d162635f91238757541550 Mon Sep 17 00:00:00 2001
From: Christoph Paasch <cpaasch@apple.com>
Date: Wed, 13 Mar 2019 09:37:01 -0700
Subject: [PATCH] DEBUG mptcp_sock_def_error_report
---
include/net/mptcp.h | 6 +++---
net/ipv4/tcp.c | 9 ++++++++-
net/ipv4/tcp_ipv4.c | 2 ++
net/ipv6/tcp_ipv6.c | 2 ++
net/mptcp/mptcp_ctrl.c | 7 ++++++-
net/mptcp/mptcp_input.c | 5 +++--
net/mptcp/mptcp_output.c | 1 +
7 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index cf56185687c0..70fea582911a 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -1255,11 +1255,11 @@ static inline bool mptcp_fallback_infinite(struct sock *sk, int flag)
if (mpcb->infinite_mapping_snd)
return false;
- pr_err("%s %#x will fallback - pi %d, src %pI4:%u dst %pI4:%u rcv_nxt %u from %pS\n",
- __func__, mpcb->mptcp_loc_token, tp->mptcp->path_index,
+ pr_err("%s %p %#x will fallback - pi %d, src %pI4:%u dst %pI4:%u rcv_nxt %u from %pS master %u\n",
+ __func__, mpcb, mpcb->mptcp_loc_token, tp->mptcp->path_index,
&inet_sk(sk)->inet_saddr, ntohs(inet_sk(sk)->inet_sport),
&inet_sk(sk)->inet_daddr, ntohs(inet_sk(sk)->inet_dport),
- tp->rcv_nxt, __builtin_return_address(0));
+ tp->rcv_nxt, __builtin_return_address(0), is_master_tp(tp));
if (!is_master_tp(tp)) {
MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_FBACKSUB);
return true;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9b01e689fe34..9828f82e8dba 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2398,8 +2398,11 @@ int tcp_disconnect(struct sock *sk, int flags)
int err = 0;
int old_state = sk->sk_state;
- if (old_state != TCP_CLOSE)
+ if (old_state != TCP_CLOSE) {
+ if (is_meta_sk(sk) && tp->mpcb->infinite_mapping_rcv)
+ pr_err("%s closing %p\n", __func__, tp->mpcb);
tcp_set_state(sk, TCP_CLOSE);
+ }
/* ABORT function of RFC793 */
if (old_state == TCP_LISTEN) {
@@ -3637,11 +3640,15 @@ EXPORT_SYMBOL(tcp_md5_hash_key);
void tcp_done(struct sock *sk)
{
struct request_sock *req = tcp_sk(sk)->fastopen_rsk;
+ struct tcp_sock *tp = tcp_sk(sk);
if (sk->sk_state == TCP_SYN_SENT || sk->sk_state == TCP_SYN_RECV)
TCP_INC_STATS(sock_net(sk), TCP_MIB_ATTEMPTFAILS);
WARN_ON(sk->sk_state == TCP_CLOSE);
+
+ if (is_meta_sk(sk) && tp->mpcb->infinite_mapping_rcv)
+ pr_err("%s closing %p from %pS\n", __func__, tp->mpcb, __builtin_return_address(0));
tcp_set_state(sk, TCP_CLOSE);
tcp_clear_xmit_timers(sk);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 52054cf187ac..f39436e66b4d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -266,6 +266,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
* This unhashes the socket and releases the local port,
* if necessary.
*/
+ if (is_meta_sk(sk) && tp->mpcb->infinite_mapping_rcv)
+ pr_err("%s closing %p\n", __func__, tp->mpcb);
tcp_set_state(sk, TCP_CLOSE);
ip_rt_put(rt);
sk->sk_route_caps = 0;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index af375a34f07b..13904e0f20ec 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -313,6 +313,8 @@ int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
return 0;
late_failure:
+ if (is_meta_sk(sk) && tp->mpcb->infinite_mapping_rcv)
+ pr_err("%s closing %p\n", __func__, tp->mpcb);
tcp_set_state(sk, TCP_CLOSE);
failure:
inet->inet_dport = 0;
diff --git a/net/mptcp/mptcp_ctrl.c b/net/mptcp/mptcp_ctrl.c
index a16a4d6e68a1..d5f3485c7e36 100644
--- a/net/mptcp/mptcp_ctrl.c
+++ b/net/mptcp/mptcp_ctrl.c
@@ -641,7 +641,8 @@ static void mptcp_sock_def_error_report(struct sock *sk)
meta_sk->sk_error_report(meta_sk);
WARN(meta_sk->sk_state == TCP_CLOSE,
- "Meta already closed i_rcv %u i_snd %u send_i %u flags %#lx\n",
+ "Meta %p already closed i_rcv %u i_snd %u send_i %u flags %#lx\n",
+ mpcb,
mpcb->infinite_mapping_rcv, mpcb->infinite_mapping_snd,
mpcb->send_infinite_mapping, meta_sk->sk_flags);
@@ -1755,6 +1756,8 @@ void mptcp_close(struct sock *meta_sk, long timeout)
if (data_was_unread) {
/* Unread data was tossed, zap the connection. */
NET_INC_STATS(sock_net(meta_sk), LINUX_MIB_TCPABORTONCLOSE);
+ if (meta_tp->mpcb->infinite_mapping_rcv)
+ pr_err("%s closing %p\n", __func__, meta_tp->mpcb);
tcp_set_state(meta_sk, TCP_CLOSE);
tcp_sk(meta_sk)->ops->send_active_reset(meta_sk,
meta_sk->sk_allocation);
@@ -1834,6 +1837,8 @@ void mptcp_close(struct sock *meta_sk, long timeout)
if (meta_sk->sk_state == TCP_FIN_WAIT2) {
if (meta_tp->linger2 < 0) {
+ if (meta_tp->mpcb->infinite_mapping_rcv)
+ pr_err("%s closing linger %p\n", __func__, meta_tp->mpcb);
tcp_set_state(meta_sk, TCP_CLOSE);
meta_tp->ops->send_active_reset(meta_sk, GFP_ATOMIC);
__NET_INC_STATS(sock_net(meta_sk),
diff --git a/net/mptcp/mptcp_input.c b/net/mptcp/mptcp_input.c
index aa21ccf6f814..c2301648aaac 100644
--- a/net/mptcp/mptcp_input.c
+++ b/net/mptcp/mptcp_input.c
@@ -541,8 +541,8 @@ static int mptcp_prevalidate_skb(struct sock *sk, struct sk_buff *skb)
*/
if (!tp->mptcp->fully_established && !mptcp_is_data_seq(skb) &&
!tp->mptcp->mapping_present && !mpcb->infinite_mapping_rcv) {
- pr_err("%s %#x will fallback - pi %d from %pS, seq %u\n",
- __func__, mpcb->mptcp_loc_token,
+ pr_err("%s %p %#x will fallback - pi %d from %pS, seq %u\n",
+ __func__, mpcb, mpcb->mptcp_loc_token,
tp->mptcp->path_index, __builtin_return_address(0),
TCP_SKB_CB(skb)->seq);
@@ -690,6 +690,7 @@ static int mptcp_detect_mapping(struct sock *sk, struct sk_buff *skb)
/* Mapping not yet set on this subflow - we set it here! */
if (!data_len) {
+ pr_err("%s %p falling back\n", __func__, mpcb);
mpcb->infinite_mapping_rcv = 1;
mpcb->send_infinite_mapping = 1;
tp->mptcp->fully_established = 1;
diff --git a/net/mptcp/mptcp_output.c b/net/mptcp/mptcp_output.c
index 6f368e935950..621ac3eaa399 100644
--- a/net/mptcp/mptcp_output.c
+++ b/net/mptcp/mptcp_output.c
@@ -471,6 +471,7 @@ static bool mptcp_skb_entail(struct sock *sk, struct sk_buff *skb, int reinject)
!before(tcb->seq, mptcp_meta_tp(tp)->snd_nxt)) {
tp->mptcp->fully_established = 1;
tp->mpcb->infinite_mapping_snd = 1;
+ pr_err("%s %p sending infinite\n", __func__, tp->mpcb);
tp->mptcp->infinite_cutoff_seq = tp->write_seq;
tcb->mptcp_flags |= MPTCPHDR_INF;
}
--
2.16.2
Kernel stability has improved a lot with the latest kernel, thanks. I finally saw one more instance of this warning with a kernel with Christoph's patch.
Mar 19 08:08:50 homeagent4 kernel: [127714.432601] TCP: tcp_ack resetting flow
Mar 19 08:08:57 homeagent4 kernel: [127721.924829] mptcp_prevalidate_skb ffff888191cf3100 0xc0cfbede will fallback - pi 1 from tcp_data_queue+0x35b/0x4b0, seq 3461279010
Mar 19 08:08:57 homeagent4 kernel: [127722.066026] TCP: tcp_done closing ffff888191cf3100 from tcp_ack+0xb63/0x12c0
Mar 19 08:08:58 homeagent4 kernel: [127722.433338] mptcp_prevalidate_skb ffff88818276ea00 0xbd97d288 will fallback - pi 1 from tcp_data_queue+0x35b/0x4b0, seq 4252167839
Mar 19 08:08:58 homeagent4 kernel: [127722.637573] TCP: tcp_done closing ffff88818276ea00 from tcp_ack+0xb63/0x12c0
Mar 19 08:08:58 homeagent4 kernel: [127722.689861] Meta ffff888181101180 already closed i_rcv 1 i_snd 1 send_i 0 flags 0x2000303
Mar 19 08:08:58 homeagent4 kernel: [127722.689884] ------------[ cut here ]------------
Mar 19 08:08:58 homeagent4 kernel: [127722.689890] WARNING: CPU: 0 PID: 0 at net/mptcp/mptcp_ctrl.c:647 mptcp_sock_def_error_report+0xff/0x110
Mar 19 08:08:58 homeagent4 kernel: [127722.689891] Modules linked in: cls_fw ipip cls_u32 sch_htb xt_policy authenc echainiv xfrm6_mode_tunnel xfrm4_mode_tunnel fou ip6_udp_tunnel udp_tunnel ip_tunnel xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo dummy ip6table_filter ip6_tables iptable_filter xt_statistic xt_connmark xt_conntrack xt_TPROXY nf_defrag_ipv6 xt_addrtype iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ppdev aesni_intel aes_x86_64 crypto_simd glue_helper cryptd snd_pcsp snd_pcm serio_raw parport_pc snd_timer snd soundcore parport mptcp_fullmesh ip_tables autofs4 nvme nvme_core ena i2c_piix4
Mar 19 08:08:58 homeagent4 kernel: [127722.689930] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.105-mptcp-kp2-3f01458b #12
Mar 19 08:08:58 homeagent4 kernel: [127722.689931] Hardware name: Amazon EC2 m5.large/, BIOS 1.0 10/16/2017
Mar 19 08:08:58 homeagent4 kernel: [127722.689932] task: ffffffff822134c0 task.stack: ffffffff82200000
Mar 19 08:08:58 homeagent4 kernel: [127722.689934] RIP: 0010:mptcp_sock_def_error_report+0xff/0x110
Mar 19 08:08:58 homeagent4 kernel: [127722.689936] RSP: 0018:ffff88822d003df0 EFLAGS: 00010286
Mar 19 08:08:58 homeagent4 kernel: [127722.689937] RAX: 000000000000004d RBX: ffff8881d9090000 RCX: 0000000000000000
Mar 19 08:08:58 homeagent4 kernel: [127722.689938] RDX: ffff88822d01e358 RSI: ffff88822d016618 RDI: ffff88822d016618
Mar 19 08:08:58 homeagent4 kernel: [127722.689939] RBP: ffff8881f71f1080 R08: 0000000000000001 R09: 0000000000000b01
Mar 19 08:08:58 homeagent4 kernel: [127722.689940] R10: 0000000000000000 R11: 0000000000000b01 R12: ffff888181101180
Mar 19 08:08:58 homeagent4 kernel: [127722.689941] R13: 0000000000000000 R14: ffff8881d9090000 R15: 0000000000000002
Mar 19 08:08:58 homeagent4 kernel: [127722.689942] FS: 0000000000000000(0000) GS:ffff88822d000000(0000) knlGS:0000000000000000
Mar 19 08:08:58 homeagent4 kernel: [127722.689943] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 19 08:08:58 homeagent4 kernel: [127722.689944] CR2: 00007fdab821f640 CR3: 000000000220a006 CR4: 00000000007606f0
Mar 19 08:08:58 homeagent4 kernel: [127722.689948] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 08:08:58 homeagent4 kernel: [127722.689949] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 19 08:08:58 homeagent4 kernel: [127722.689949] PKRU: 55555554
Mar 19 08:08:58 homeagent4 kernel: [127722.689950] Call Trace:
Mar 19 08:08:58 homeagent4 kernel: [127722.689965] <IRQ>
Mar 19 08:08:58 homeagent4 kernel: [127722.689969] tcp_write_err+0x2b/0x50
Mar 19 08:08:58 homeagent4 kernel: [127722.689971] tcp_write_timeout+0x7a/0x430
Mar 19 08:08:58 homeagent4 kernel: [127722.689973] tcp_retransmit_timer+0x2aa/0x4e0
Mar 19 08:08:58 homeagent4 kernel: [127722.689974] ? tcp_write_timer_handler+0x240/0x240
Mar 19 08:08:58 homeagent4 kernel: [127722.689976] mptcp_sub_retransmit_timer+0xe/0xc0
Mar 19 08:08:58 homeagent4 kernel: [127722.689978] tcp_write_timer_handler+0xed/0x240
Mar 19 08:08:58 homeagent4 kernel: [127722.689980] tcp_write_timer+0xaf/0xc0
Mar 19 08:08:58 homeagent4 kernel: [127722.689983] call_timer_fn+0x30/0x130
Mar 19 08:08:58 homeagent4 kernel: [127722.689985] run_timer_softirq+0x1c7/0x400
Mar 19 08:08:58 homeagent4 kernel: [127722.689989] ? timerqueue_add+0x54/0x80
Mar 19 08:08:58 homeagent4 kernel: [127722.689990] ? enqueue_hrtimer+0x38/0x80
Mar 19 08:08:58 homeagent4 kernel: [127722.689994] __do_softirq+0x10d/0x2b4
Mar 19 08:08:58 homeagent4 kernel: [127722.689999] irq_exit+0xc2/0xd0
Mar 19 08:08:58 homeagent4 kernel: [127722.690001] smp_apic_timer_interrupt+0x78/0x140
Mar 19 08:08:58 homeagent4 kernel: [127722.690003] apic_timer_interrupt+0x7d/0x90
Mar 19 08:08:58 homeagent4 kernel: [127722.690004] </IRQ>
Mar 19 08:08:58 homeagent4 kernel: [127722.690006] RIP: 0010:native_safe_halt+0x2/0x10
Mar 19 08:08:58 homeagent4 kernel: [127722.690007] RSP: 0018:ffffffff82203eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
Mar 19 08:08:58 homeagent4 kernel: [127722.690008] RAX: ffffffff818415a0 RBX: ffffffff822134c0 RCX: 0000000000000000
Mar 19 08:08:58 homeagent4 kernel: [127722.690009] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Mar 19 08:08:58 homeagent4 kernel: [127722.690009] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000003
Mar 19 08:08:58 homeagent4 kernel: [127722.690010] R10: ffffffff82203e28 R11: 0000000000000000 R12: ffffffff822134c0
Mar 19 08:08:58 homeagent4 kernel: [127722.690011] R13: ffffffff822134c0 R14: 0000000000000000 R15: 0000000000000000
Mar 19 08:08:58 homeagent4 kernel: [127722.690013] ? __cpuidle_text_start+0x8/0x8
Mar 19 08:08:58 homeagent4 kernel: [127722.690015] default_idle+0x1a/0xf0
Mar 19 08:08:58 homeagent4 kernel: [127722.690017] do_idle+0x166/0x1d0
Mar 19 08:08:58 homeagent4 kernel: [127722.690020] cpu_startup_entry+0x19/0x20
Mar 19 08:08:58 homeagent4 kernel: [127722.690024] start_kernel+0x466/0x46e
Mar 19 08:08:58 homeagent4 kernel: [127722.690027] secondary_startup_64+0xa5/0xb0
Mar 19 08:08:58 homeagent4 kernel: [127722.690028] Code: 24 20 4c 8b 4d 60 4c 89 e6 48 c7 c7 a0 7a 12 82 44 89 c1 44 89 c2 41 83 e0 01 c0 e9 07 c0 ea 06 0f b6 c9 83 e2 01 e8 8c 21 8d ff <0f> 0b e9 55 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
Mar 19 08:08:58 homeagent4 kernel: [127722.690057] ---[ end trace 35aabb138b1d57ea ]---
I hope this helps.
Mar 19 08:08:58 homeagent4 kernel: [127722.637573] TCP: tcp_done closing ffff88818276ea00 from tcp_ack+0xb63/0x12c0
Interesting! You are coming from tcp_ack
. I just fixed an mptcp_sock_def_error_report
when coming from mptcp_time_wait
. So, there seems to be one more around... I have to see how your box can get in your situation.
Kernel stability has improved a lot with the latest kernel, thanks.
Yes, it has! I have another set of fixes coming soon that should make it even better.
Here is a little context of two more recent warnings (same machine, same kernel).
First one:
Mar 19 12:09:56 homeagent4 kernel: [142180.657861] TCP: tcp_done closing ffff8881baf85c00 from tcp_ack+0xb63/0x12c0
Mar 19 12:10:04 homeagent4 kernel: [142188.054619] TCP: mptcp_fallback_infinite ffff8881b7cf0380 0x910ce999 will fallback - pi 1, src 216.58.206.106:443 dst 192.168.243.17:35185 rcv_nxt 2257741621 from tcp_rcv_state_process+0x24e/0x86d master 1
Mar 19 12:10:04 homeagent4 kernel: [142188.064951] TCP: tcp_done closing ffff8881b7cf0380 from tcp_ack+0xb63/0x12c0
Mar 19 12:10:04 homeagent4 kernel: [142188.070015] TCP: mptcp_fallback_infinite ffff8881b8fdb480 0xa47cd70f will fallback - pi 1, src 216.58.206.106:443 dst 192.168.243.17:52789 rcv_nxt 1784841723 from tcp_rcv_state_process+0x24e/0x86d master 1
Mar 19 12:10:04 homeagent4 kernel: [142188.080158] TCP: tcp_done closing ffff8881b8fdb480 from tcp_ack+0xb63/0x12c0
Mar 19 12:10:20 homeagent4 kernel: [142204.399970] TCP: tcp_done closing ffff8881da031880 from mptcp_close+0x54c/0x5a0
Mar 19 12:10:21 homeagent4 kernel: [142205.948879] TCP: tcp_done closing ffff8881b10f1880 from tcp_write_err+0x3b/0x50
Mar 19 12:12:29 homeagent4 kernel: [142333.771613] TCP: mptcp_fallback_infinite ffff8881ae844a80 0x5c29812b will fallback - pi 3, src 157.240.1.52:443 dst 192.168.246.187:53387 rcv_nxt 2084667492 from tcp_rcv_state_process+0x24e/0x86d master 0
Mar 19 12:12:29 homeagent4 kernel: [142333.782005] TCP: tcp_ack resetting flow
Mar 19 12:12:47 homeagent4 kernel: [142351.354838] Meta ffff8881b10f1880 already closed i_rcv 1 i_snd 1 send_i 0 flags 0x2004303
Mar 19 12:12:47 homeagent4 kernel: [142351.354859] ------------[ cut here ]------------
Mar 19 12:12:47 homeagent4 kernel: [142351.354873] WARNING: CPU: 1 PID: 0 at net/mptcp/mptcp_ctrl.c:647 mptcp_sock_def_error_report+0xff/0x110
And this one:
Mar 19 17:09:40 homeagent4 kernel: [160163.882852] TCP: tcp_ack resetting flow
Mar 19 17:10:07 homeagent4 kernel: [160191.022463] TCP: mptcp_fallback_infinite ffff8881f6715500 0x3952dc1f will fallback - pi 4, src 216.58.206.78:443 dst 192.168.242.223:49045 rcv_nxt 680152616 from tcp_rcv_state_process+0x24e/0x86d master 0
Mar 19 17:10:07 homeagent4 kernel: [160191.032993] TCP: tcp_ack resetting flow
Mar 19 17:10:13 homeagent4 kernel: [160196.928981] TCP: mptcp_fallback_infinite ffff888221da4a80 0x7843dc49 will fallback - pi 6, src 52.213.101.26:443 dst 192.168.243.82:36633 rcv_nxt 1677205542 from tcp_rcv_state_process+0x24e/0x86d master 0
Mar 19 17:10:13 homeagent4 kernel: [160196.939249] TCP: tcp_ack resetting flow
Mar 19 17:10:31 homeagent4 kernel: [160215.560758] TCP: tcp_done closing ffff8881daf19180 from tcp_keepalive_timer+0x209/0x310
Mar 19 17:10:43 homeagent4 kernel: [160226.865877] TCP: mptcp_fallback_infinite ffff8881baa64700 0xa22247f8 will fallback - pi 1, src 78.46.49.23:443 dst 192.168.241.186:36193 rcv_nxt 1419597485 from tcp_rcv_state_process+0x24e/0x86d master 1
Mar 19 17:10:43 homeagent4 kernel: [160226.876156] TCP: tcp_done closing ffff8881baa64700 from mptcp_data_ack+0x9d8/0x9e0
Mar 19 17:10:43 homeagent4 kernel: [160226.883025] TCP: mptcp_fallback_infinite ffff888222d5dc00 0xdaeaa963 will fallback - pi 1, src 31.13.90.6:443 dst 192.168.241.186:59767 rcv_nxt 3743024967 from tcp_rcv_state_process+0x24e/0x86d master 1
Mar 19 17:10:43 homeagent4 kernel: [160227.086559] TCP: tcp_done closing ffff888222d5dc00 from mptcp_sock_def_error_report+0x6b/0x110
Mar 19 17:11:39 homeagent4 kernel: [160283.400217] TCP: tcp_done closing ffff8881d6a60a80 from tcp_keepalive_timer+0x209/0x310
Mar 19 17:11:49 homeagent4 kernel: [160293.639960] Meta ffff8881bcce8380 already closed i_rcv 1 i_snd 1 send_i 0 flags 0x2004303
Mar 19 17:11:49 homeagent4 kernel: [160293.639985] ------------[ cut here ]------------
Mar 19 17:11:49 homeagent4 kernel: [160293.639993] WARNING: CPU: 1 PID: 0 at net/mptcp/mptcp_ctrl.c:647 mptcp_sock_def_error_report+0xff/0x110
Let me know if you need more and I'll keep adding them as I find them.
I get the same after upgrade to kernel 4.14.110+
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.393075] Meta already closed i_rcv 1 i_snd 1 send_i 0 flags 0x2004301
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.395577] ------------[ cut here ]------------
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.397512] WARNING: CPU: 3 PID: 0 at net/mptcp/mptcp_ctrl.c:644 mptcp_sock_def_error_report+0x101/0x110
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.401635] Modules linked in: binfmt_misc xt_multiport tun xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_comment xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c iptable_filter snd_pcm snd_timer sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd aesni_intel aes_x86_64 vmw_balloon crypto_simd glue_helper soundcore cryptd evdev serio_raw sg pcspkr shpchp vmwgfx ttm drm_kms_helper drm pcc_cpufreq ac vmw_vsock_vmci_transport vsock button vmw_vmci mptcp_balia ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom ata_generic sd_mod ata_piix crc32c_intel psmouse libata vmw_pvscsi vmxnet3 i2c_piix4 scsi_mod floppy
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.420369] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.110+ #2
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.422665] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/03/2018
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.426818] task: ffff888236e5e180 task.stack: ffffc90000cd8000
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.428952] RIP: 0010:mptcp_sock_def_error_report+0x101/0x110
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.431055] RSP: 0018:ffff88823fd83e58 EFLAGS: 00010292
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.432995] RAX: 000000000000003c RBX: ffff8882250dc6c0 RCX: 0000000000000000
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.435302] RDX: ffff88823fd9e3b8 RSI: ffff88823fd966d8 RDI: ffff88823fd966d8
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.437576] RBP: ffff888224c5ec80 R08: 000000000000060c R09: 0000000000aaaaaa
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.440054] R10: ffff88823fd83e68 R11: 00000000ffffffff R12: ffff8882102040c0
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.442490] R13: ffffffff8159d100 R14: ffff8882250dc6c0 R15: 0000000000000000
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.444792] FS: 0000000000000000(0000) GS:ffff88823fd80000(0000) knlGS:0000000000000000
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.447295] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.449221] CR2: 00007f7572c03000 CR3: 000000000200a002 CR4: 00000000001606e0
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.451524] Call Trace:
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.452818] <IRQ>
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.453990] tcp_write_err+0x2b/0x50
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.455418] tcp_write_timer_handler+0x16d/0x230
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.457035] tcp_write_timer+0x98/0xa0
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.458497] call_timer_fn+0x30/0x130
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.459909] run_timer_softirq+0x3cc/0x420
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.461398] ? enqueue_hrtimer+0x37/0x80
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.462857] ? recalibrate_cpu_khz+0x10/0x10
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.464356] __do_softirq+0xe3/0x2c7
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.465677] irq_exit+0xbd/0xd0
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.466873] smp_apic_timer_interrupt+0x78/0x130
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.468440] apic_timer_interrupt+0x7d/0x90
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.469875] </IRQ>
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.470862] RIP: 0010:native_safe_halt+0x2/0x10
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.472342] RSP: 0018:ffffc90000cdbee0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.474459] RAX: ffffffff8162c5f0 RBX: ffffffff8211cc60 RCX: 0000000000000000
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.476437] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.478389] RBP: 0000000000000003 R08: 0000000000000002 R09: 0000000000000001
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.480338] R10: ffffc90000cdbe88 R11: 0000000000000001 R12: 0000000000000000
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.482264] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.484165] ? __cpuidle_text_start+0x8/0x8
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.485713] default_idle+0x1a/0x100
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.486898] do_idle+0x15f/0x1d0
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.488004] cpu_startup_entry+0x5f/0x70
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.489236] start_secondary+0x19e/0x1e0
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.490454] secondary_startup_64+0xa5/0xb0
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.491711] Code: a4 41 0f b6 74 24 20 4c 8b 45 40 48 c7 c7 08 82 dc 81 89 f2 89 f1 40 c0 ee 06 c0 ea 07 83 e1 01 83 e6 01 0f b6 d2 e8 8a 41 ae ff <0f> 0b e9 6b ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41
Apr 25 08:48:26 PTSMAR2 kernel: [ 7656.496754] ---[ end trace 5eacbda62d93e66d ]---
The issue at https://github.com/multipath-tcp/mptcp/issues/352 seems to be the same. I have updated the debug-patch on that issue. I need to find out who is calling tcp_write_err
. I am unable at the moment to repro... :-/
If you still hit the issue, can you apply the patch from #352 ?
Closing, I'm pretty sure it's a dupe of https://github.com/multipath-tcp/mptcp/issues/352.
A patch is now in different branches.
@knxhm @kpeletidis do you think it could be possible for you to compile a branch and check if you still have the issue please?
Sorry @matttbe I don't have access to that environment anymore.
Hi I found messages like below in my Kernel log on 2 different machines which are running similar traffic with lots of parallel MPTCP connections. Dont know whats the implications and what is about to happen when the load further increases.
Pls have a look on this.
Regards Franz