multipath-tcp / mptcp

⚠️⚠️⚠️ Deprecated 🚫 Out-of-tree Linux Kernel implementation of MultiPath TCP. 👉 Use https://github.com/multipath-tcp/mptcp_net-next repo instead ⚠️⚠️⚠️
https://github.com/multipath-tcp/mptcp_net-next
Other
888 stars 336 forks source link

TCP: tcp_ack resetting flow #243

Open berezoka opened 6 years ago

berezoka commented 6 years ago

Hello, in server side I got some errors:

[Thu Mar  1 23:19:24 2018] mptcp_verif_dss_csum csum is wrong: 0xc5e9 data_seq 1924841623 dss_csum_added 1 overflowed 0 iterations 1
[Thu Mar  1 23:49:31 2018] TCP: mptcp_fallback_infinite 0x806a1f10 will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:56279 rcv_nxt 2923808224 from tcp_rcv_state_process+0x1de/0x820
[Thu Mar  1 23:49:31 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 00:27:24 2018] TCP: mptcp_fallback_infinite 0xa29a980d will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:17413 rcv_nxt 3298219397 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 00:27:24 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 00:56:25 2018] TCP: mptcp_fallback_infinite 0xe5f938ee will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:19303 rcv_nxt 2875906644 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 00:56:25 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 02:15:31 2018] TCP: mptcp_fallback_infinite 0x54a53d7 will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:41031 rcv_nxt 2204363645 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 02:15:31 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 03:10:47 2018] TCP: mptcp_fallback_infinite 0x521a016c will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:44595 rcv_nxt 1751121396 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 03:10:47 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 03:11:33 2018] TCP: mptcp_fallback_infinite 0x83322e3a will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:34951 rcv_nxt 3543815736 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 03:11:33 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 03:38:25 2018] TCP: mptcp_fallback_infinite 0x8cfa64bd will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:35017 rcv_nxt 1180093270 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 03:38:25 2018] TCP: tcp_ack resetting flow

Client side was used LAN+LTE and no any errors. After error "TCP: tcp_ack resetting flow" client side traffic is not gonna go back to LTE. I need to stop and start the stream so that can go back to LTE.

MPTCP version 0.93.1 (https://github.com/multipath-tcp/mptcp/archive/mptcp_v0.93.zip)
net.mptcp.mptcp_checksum = 1
net.mptcp.mptcp_debug = 0
net.mptcp.mptcp_enabled = 1
net.mptcp.mptcp_path_manager = fullmesh
net.mptcp.mptcp_scheduler = default
net.mptcp.mptcp_syn_retries = 3
net.mptcp.mptcp_version = 0

Congestions:

net.ipv4.tcp_allowed_congestion_control = balia reno
net.ipv4.tcp_available_congestion_control = balia reno cubic lia olia wvegas
net.ipv4.tcp_congestion_control = balia

Where the problem might be?

matttbe commented 6 years ago

Hi @berezoka,

@cpaasch sent some patches on mptcp-dev. They may fix this bug. Can you test them?

Note that they should be soon in Github.

berezoka commented 6 years ago

@matttbe are you talking about this patch? DATA_ACK.patch.txt

berezoka commented 6 years ago

hi, after patch situation is a same:

[Sat Mar 3 16:23:16 2018] mptcp_verif_dss_csum csum is wrong: 0x63dd data_seq 1369751649 dss_csum_added 1 overflowed 0 iterations 1 [Sat Mar 3 18:53:44 2018] mptcp_verif_dss_csum csum is wrong: 0x800 data_seq 3335471871 dss_csum_added 1 overflowed 0 iterations 1

in client side I must to do: ip link set dev lte_link0 multipath off ip link set dev lte_link0 multipath on to traffic go back to LTE link.

what is another solution?

cpaasch commented 6 years ago

Hmmm... Can you double-check that the kernel you compiled has the commit f2a4860dc697 ("mptcp: Update 64-bit receiver indexes after processing ofo-queue") ?

If that's the case, can you capture a packet-trace and wait until you see one of these messages pop up? Thanks!

berezoka commented 6 years ago

about hour ago I make compilation of new kernel with latest patches and situation is a same:

[Sun Mar 4 02:58:15 2018] mptcp_verif_dss_csum csum is wrong: 0x10 data_seq 1890498936 dss_csum_added 1 overflowed 0 iterations 1

f2a4860 is applied screen shot 2018-03-04 at 03 11 30

packets I will capture tomorrow.

cpaasch commented 6 years ago

Thanks, I'm waiting for your capture!

cpaasch commented 6 years ago

Please, also store the mptcp_verif_dss_csum message you see together with the packet-trace. I need both to correlate the flows.

berezoka commented 6 years ago

I can't go to sleep so I made packets capture: 214.log.txt

error was: [Sun Mar 4 03:55:13 2018] mptcp_verif_dss_csum csum is wrong: 0x4517 data_seq 1105057857 dss_csum_added 1 overflowed 0 iterations 1

one more: [Sun Mar 4 21:18:51 2018] mptcp_verif_dss_csum csum is wrong: 0xfbff data_seq 371472309 dss_csum_added 1 overflowed 0 iterations 1 packets_02.zip

cpaasch commented 6 years ago

Hmmm... This is weird. Looking at the pcap, it really seems like commit f2a4860 is not in the kernel that you are booting.

How much data are you transmitting here? (it looks like you are transmitting huge amounts of data) How frequent is the error? Do you also see it happening when transmitting only small amounts on a single connection?

berezoka commented 6 years ago

boot kernel is: Linux b 4.9.80 #1 SMP Sun Mar 4 17:35:08 EET 2018 x86_64 GNU/Linux f2a4860 is really applied. Errors is randomly and sequencing I not finding. Traffic is small via LAN is ~5mbps + LTE ~2mbps. Fullmesh parameters: num_subflows =1 create_on_err = 1

Any ideas?

cpaasch commented 6 years ago

Traffic is small? I see more than 700MB transmitted on a single subflow:

screen shot 2018-03-05 at 3 45 32 pm

I will get you a debug-patch that you can apply for testing.

cpaasch commented 6 years ago

Also - just to be sure - the other host is booting the same kernel, right?

cpaasch commented 6 years ago

And, which scheduler are you using?

cpaasch commented 6 years ago

Here is the debug-patch that would be good to apply:

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index c642b683abc7..9ac1254ef555 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -1131,6 +1131,8 @@ static inline void mptcp_check_sndseq_wrap(struct tcp_sock *meta_tp, int inc)
        struct mptcp_cb *mpcb = meta_tp->mpcb;
        mpcb->snd_hiseq_index = mpcb->snd_hiseq_index ? 0 : 1;
        mpcb->snd_high_order[mpcb->snd_hiseq_index] += 2;
+
+       pr_err("%s %#x wrapped around at %u\n", __func__, meta_tp->mpcb->mptcp_loc_token, meta_tp->snd_nxt);
    }
 }

@@ -1141,6 +1143,8 @@ static inline void mptcp_check_rcvseq_wrap(struct tcp_sock *meta_tp,
        struct mptcp_cb *mpcb = meta_tp->mpcb;
        mpcb->rcv_high_order[mpcb->rcv_hiseq_index] += 2;
        mpcb->rcv_hiseq_index = mpcb->rcv_hiseq_index ? 0 : 1;
+
+       pr_err("%s %#x wrapped around at %u\n", __func__, meta_tp->mpcb->mptcp_loc_token, meta_tp->rcv_nxt);
    }
 }

diff --git a/net/mptcp/mptcp_input.c b/net/mptcp/mptcp_input.c
index fb3d99379e11..0442ad051cec 100644
--- a/net/mptcp/mptcp_input.c
+++ b/net/mptcp/mptcp_input.c
@@ -349,8 +349,8 @@ static int mptcp_verif_dss_csum(struct sock *sk)

    /* Now, checksum must be 0 */
    if (unlikely(csum_fold(csum_tcp))) {
-       pr_err("%s csum is wrong: %#x data_seq %u dss_csum_added %d overflowed %d iterations %d\n",
-              __func__, csum_fold(csum_tcp), TCP_SKB_CB(last)->seq,
+       pr_err("%s %#x csum is wrong: %#x TCP-seq %u dss_csum_added %d overflowed %d iterations %d\n",
+              __func__, tp->mpcb->mptcp_loc_token, csum_fold(csum_tcp), TCP_SKB_CB(last)->seq,
               dss_csum_added, overflowed, iter);

        MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_CSUMFAIL);
berezoka commented 6 years ago

traffic going 24h/7 :) and data was sent ~240GB pre day via LAN+LTE. Booting kernel is the same in a both sides :). Congestion at this moment I use "balia", scheduler was used with "default" and "redundant" but error always is a same :(. Talking about kernel panic and "redundant" was no any problems like in #214 Now compiling kernel with debug patch and found one warning: screen shot 2018-03-06 at 09 30 34

berezoka commented 6 years ago

[Tue Mar 6 11:38:45 2018] mptcp_verif_dss_csum 0x8e8828af csum is wrong: 0x7ae7 TCP-seq 4059809018 dss_csum_added 1 overflowed 0 iterations 1

berezoka commented 6 years ago

new catch:

[Tue Mar 6 16:19:23 2018] mptcp_detect_mapping Mappings do not match! [Tue Mar 6 16:19:23 2018] mptcp_detect_mapping dseq 3709291268 mdseq 3709267468, sseq 4233560286 msseq 4233558885 dlen 1400 mdlen 25976 dfin 0 mdfin 0 packets_02.pcap.zip

berezoka commented 6 years ago

errors from server side: [Tue Mar 6 21:57:19 2018] mptcp_check_rcvseq_wrap 0x5f5e5aa0 wrapped around at 214 [Tue Mar 6 22:30:46 2018] mptcp_verif_dss_csum 0x5f5e5aa0 csum is wrong: 0x76ff TCP-seq 1797150116 dss_csum_added 1 overflowed 0 iterations 1 server_03.pcap.zip

from client side was no errors. Here is only packets: client_03.pcap.zip

berezoka commented 6 years ago

I make some tests with version 0.94 but I get same error: [Sun Mar 11 01:38:24 2018] mptcp_verif_dss_csum csum is wrong: 0x77d8 data_seq 4005789404 dss_csum_added 1 overflowed 0 iterations 1

one more thing, I can't load balia module: ~# modprobe mptcp_balia modprobe: ERROR: could not insert 'mptcp_balia': Invalid argument [Sun Mar 11 10:14:05 2018] TCP: balia does not implement required ops

kernel was installed from debian repo v4.14.24.mptcp

cpaasch commented 6 years ago

Hmmm... I have an idea as to what might be going wrong. I wonder if we handle wrap-around of the data-sequence number correctly on the client-side. I have to do some digging here.

As for the mptcp_balia-issue, this congestion control is not very well maintained and was more of a research-project. I would suggest you use the more proven-out congestion controls like Cubic, BBR,...

berezoka commented 6 years ago

however strange but with Cubic this error occurs after 3-4 hours working with alive session

cpaasch commented 6 years ago

So, you mean that with Cubic it happens less often? What was the frequency with Balia?

berezoka commented 6 years ago

yes, is more leas with Cubic than Balia. With Balia was 2-3 times per hour.

berezoka commented 6 years ago

Hello, maybe there is a solution to solve this error “mptcp_verif_dss_csum 0x7a6e6987 csum is wrong ..." ?

berezoka commented 6 years ago

here is some logs:

Apr 25 17:26:28 b0 kernel: [ 5385.614813] mptcp_check_rcvseq_wrap 0x7ddc4547 wrapped around at 7064 Apr 25 17:26:28 b1 kernel: [ 8947.190313] mptcp_check_sndseq_wrap 0x5978af0e wrapped around at 4294966032 Apr 25 18:03:03 b0 kernel: [ 7580.169429] mptcp_verif_dss_csum 0x504d92ac csum is wrong: 0xf7b3 TCP-seq 2881437209 dss_csum_added 1 overflowed 0 iterations 1 Apr 25 18:08:07 b1 kernel: [11447.359862] mptcp_check_sndseq_wrap 0xa53b2d21 wrapped around at 4294966263 Apr 25 18:08:08 b0 kernel: [ 7884.860242] mptcp_check_rcvseq_wrap 0xeac5c8e wrapped around at 355 Apr 25 18:24:45 b1 kernel: [12445.481329] mptcp_check_sndseq_wrap 0x36be8f7d wrapped around at 4294967123 Apr 25 18:24:46 b0 kernel: [ 8883.103881] mptcp_check_rcvseq_wrap 0x7956b5e4 wrapped around at 1215 Apr 25 18:31:22 b0 kernel: [ 9279.222064] mptcp_verif_dss_csum 0x7956b5e4 csum is wrong: 0x8603 TCP-seq 947887266 dss_csum_added 1 overflowed 0 iterations 1 Apr 25 19:03:02 b0 kernel: [11179.662805] mptcp_verif_dss_csum 0x262dcfc8 csum is wrong: 0x93bb TCP-seq 2031246860 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 08:49:35 b0 kernel: [60771.282519] mptcp_check_rcvseq_wrap 0x7a6e6987 wrapped around at 979 Apr 26 08:49:35 b1 kernel: [ 4455.636286] mptcp_check_sndseq_wrap 0x30754331 wrapped around at 4294966887 Apr 26 09:02:07 b0 kernel: [61523.620784] mptcp_verif_dss_csum 0x7a6e6987 csum is wrong: 0xcfff TCP-seq 1601541607 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 09:07:28 b0 kernel: [61843.893513] mptcp_verif_dss_csum 0x7a6e6987 csum is wrong: 0xf754 TCP-seq 1921988663 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 09:09:00 b1 kernel: [ 5620.810096] mptcp_check_sndseq_wrap 0x4eff5b8e wrapped around at 4294967165 Apr 26 09:09:00 b0 kernel: [61936.236414] mptcp_check_rcvseq_wrap 0x1311fcdc wrapped around at 1257 Apr 26 09:53:08 b0 kernel: [64584.210484] mptcp_verif_dss_csum 0x1311fcdc csum is wrong: 0x7fd6 TCP-seq 1822176941 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 10:21:29 b1 kernel: [ 9970.842549] mptcp_check_sndseq_wrap 0x6de99c18 wrapped around at 4294966892 Apr 26 10:21:29 b0 kernel: [66285.500103] mptcp_check_rcvseq_wrap 0x7cffcbf4 wrapped around at 984 Apr 26 10:24:42 b0 kernel: [66477.884652] mptcp_verif_dss_csum 0x7cffcbf4 csum is wrong: 0x87 TCP-seq 2654382263 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 10:51:43 b0 kernel: [68099.464041] mptcp_verif_dss_csum 0x24355f43 csum is wrong: 0xdfff TCP-seq 1614789752 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 10:57:54 b0 kernel: [68470.180733] mptcp_check_rcvseq_wrap 0x24355f43 wrapped around at 168 Apr 26 10:57:54 b1 kernel: [12155.987283] mptcp_check_sndseq_wrap 0x89f8804c wrapped around at 4294966076 Apr 26 11:39:14 b0 kernel: [70949.807786] mptcp_check_rcvseq_wrap 0x88daffc2 wrapped around at 1128 Apr 26 11:39:14 b1 kernel: [14636.069070] mptcp_check_sndseq_wrap 0xdc965297 wrapped around at 4294967036 Apr 26 12:04:16 b0 kernel: [72451.819008] mptcp_verif_dss_csum 0x88daffc2 csum is wrong: 0x8411 TCP-seq 3327185381 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 12:20:06 b0 kernel: [73402.127478] mptcp_check_rcvseq_wrap 0xc256549e wrapped around at 951 Apr 26 12:20:06 b1 kernel: [17088.863686] mptcp_check_sndseq_wrap 0x6ea198ad wrapped around at 4294966859 Apr 26 13:48:59 b0 kernel: [78734.337969] mptcp_check_rcvseq_wrap 0xc256549e wrapped around at 1317 Apr 26 13:48:59 b1 kernel: [22422.080083] mptcp_check_sndseq_wrap 0x6ea198ad wrapped around at 4294967225 Apr 26 14:04:23 b0 kernel: [79659.148000] mptcp_verif_dss_csum 0xd3c5671e csum is wrong: 0xc10 TCP-seq 422445808 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 14:25:47 b1 kernel: [24631.323512] mptcp_check_sndseq_wrap 0xa494c1cb wrapped around at 4294966697 Apr 26 14:25:48 b0 kernel: [80943.223952] mptcp_check_rcvseq_wrap 0xee581318 wrapped around at 4953 Apr 26 16:04:24 b0 kernel: [86859.398228] mptcp_verif_dss_csum 0x31ec85b5 csum is wrong: 0xffef TCP-seq 2230254616 dss_csum_added 1 overflowed 1 iterations 1 Apr 26 17:57:25 b0 kernel: [93640.042730] mptcp_verif_dss_csum 0x5910e9ca csum is wrong: 0xec7e TCP-seq 2441195065 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 18:05:51 b0 kernel: [94146.346244] mptcp_verif_dss_csum 0x5910e9ca csum is wrong: 0x4079 TCP-seq 1227387143 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 19:12:27 b0 kernel: [98142.133871] mptcp_check_rcvseq_wrap 0xc4ba4f09 wrapped around at 733 Apr 26 20:13:54 b0 kernel: [101829.308346] mptcp_verif_dss_csum 0x57312bda csum is wrong: 0x4bf8 TCP-seq 1497321443 dss_csum_added 1 overflowed 0 iterations 1 Apr 26 20:40:06 b1 kernel: [ 8841.292427] mptcp_check_sndseq_wrap 0x58bc2176 wrapped around at 4294967104 Apr 26 20:40:06 b0 kernel: [103401.275533] mptcp_check_rcvseq_wrap 0x7b150f76 wrapped around at 1196

b0 - server b1 - client

lenormf commented 5 years ago

Hi,

I'm seeing the same type of errors on a trivial setup with 2 VLANs:

[…]
[ 2373.544290] TCP: mptcp_fallback_infinite 0x2e0ed3e4 will fallback - pi 1, src 192.168.2.20:51454 dst 192.168.2.1:5201 rcv_nxt 3490125391 from tcp_rcv_established+0x564/0x818
[ 2383.424398] TCP: mptcp_fallback_infinite 0x17435639 will fallback - pi 2, src 192.168.3.20:39195 dst 192.168.3.1:5201 rcv_nxt 1385930753 from tcp_rcv_state_process+0x240/0x8ec
[ 2383.438635] TCP: tcp_ack resetting flow
[ 2404.467775] TCP: mptcp_fallback_infinite 0xe518971 will fallback - pi 1, src 192.168.2.20:51494 dst 192.168.2.1:5201 rcv_nxt 1353534289 from tcp_rcv_established+0x564/0x818
[ 2404.482715] TCP: mptcp_fallback_infinite 0xfe6974a will fallback - pi 1, src 192.168.2.20:51498 dst 192.168.2.1:5201 rcv_nxt 3353088233 from tcp_rcv_established+0x564/0x818
[ 2404.497820] TCP: mptcp_fallback_infinite 0xb8dff011 will fallback - pi 1, src 192.168.2.20:51502 dst 192.168.2.1:5201 rcv_nxt 3432046503 from tcp_rcv_established+0x564/0x818
[ 2443.949321] TCP: mptcp_fallback_infinite 0x5f0ba4b9 will fallback - pi 2, src 192.168.3.20:42939 dst 192.168.3.1:5201 rcv_nxt 1032229633 from tcp_rcv_state_process+0x240/0x8ec
[ 2443.963526] TCP: tcp_ack resetting flow
[…]

What does the error messages mean? Do I need to patch my kernel with debugging prints that I could send back to you, to help?

lenormf commented 5 years ago

Also getting weird traces from time to time:

[ 7012.855984] ------------[ cut here ]------------
[ 7012.859199] WARNING: CPU: 2 PID: 15 at net/mptcp/mptcp_ctrl.c:705 mptcp_sock_def_error_report+0xec/0x13c
[ 7012.868667] Meta already closed i_rcv 1 i_snd 1 send_i 0 flags 0x2004301
[ 7012.875306] Modules linked in: DECT_paging cosic drv_timer drv_vmmc pppoe ppp_async mac_violation_mirror ltq_mpe_hal_drv ltq_directpath_datapath l2tp_ppp iptable_nat cdc_mbim pppox ppp_generic ppa_api_tmplbuf ppa_api_sw_accel_mod ppa_api nf_nat_pptp nf_nat_ipv4 nf_nat_amanda nf_conntrack_pptp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack_amanda ltq_tmu_hal_drv ltq_pae_hal ltq_eth_drv_xrx500 ipt_TRIGGER ipt_REJECT ipt_MASQUERADE ebtable_nat ebtable_filter ebtable_broute dc_mode0_xrx500 cdc_ncm cdc_ether xt_time xt_tcpmss xt_statistic xt_state xt_socket xt_recent xt_policy xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_extmark xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TPROXY xt_TCPMSS xt_REDIRECT xt_NFQUEUE xt_NETMAP xt_LOG xt_HL xt_DSCP xt_CLASSIFY xrx500_phy_fw usbnet usblp ts_kmp ts_fsm ts_bm slhc ppa_drv_stack_al phy_grx500_usb pecostat_noIRQ nfnetlink_queue nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtsp nf_conntrack_rtcache nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast nf_conntrack macvlan ltq_voip_timer_driver ltq_crypto iptable_mangle iptable_filter ipt_ah ipt_ECN ip_tables ebtables ebt_vlan ebt_stp ebt_snat ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_ip ebt_extmark_m ebt_extmark ebt_dnat ebt_arpreply ebt_arp ebt_among ebt_802_3 dwc3_grx500 directconnect_datapath crc_ccitt cpuload cdc_wdm cdc_acm br_netfilter fuse sch_teql em_nbyte sch_prio sch_dsmark sch_pie em_meta sch_gred cls_basic act_ipt em_text sch_codel sch_red sch_fq sch_sfq act_police em_cmp act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress configs xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables msdos ip6_gre ip_gre gre sit l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipcomp6 xfrm6_tunnel xfrm6_mode_tunnel xfrm6_mode_transport xfrm6_mode_beet esp6 ah6 ipcomp xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 ip6_tunnel tunnel6 tunnel4 ip_tunnel veth af_key xfrm_user xfrm_ipcomp xfrm_algo vfat fat hfsplus[ 7013.113392] cdc_ether 2-1:2.0 wwan0: kevent 11 may have been dropped
[ 7013.119934]  hfs autofs4 vrx518 drv_kpi2udp nls_utf8 nls_iso8859_1 nls_cp437 drv_sdd_mbx drv_tapi drv_ifxos sha512_generic sha1_generic md5 echainiv des_generic cmac cbc authenc usb_storage xhci_plat_hcd xhci_pci xhci_hcd dwc3 sd_mod scsi_mod ext4 jbd2 mbcache exfat usbcore nls_base usb_common mii crc32c_generic
[ 7013.147533] CPU: 2 PID: 15 Comm: ksoftirqd/2 Tainted: G        W       4.9.109+ #0
[ 7013.155077] Stack : 00000006 00000000 00000000 00000000 00000000 00000000 60927e9a 00000046
[ 7013.163407]         00000000 00000000 00000000 60920000 607a0000 6079ee07 606e5e3c 00000002
[ 7013.171740]         0000000f 60923a44 607d6200 726d55c0 00600000[ 7013.177370] cdc_ether 2-1:2.0 wwan0: kevent 11 may have been dropped
[ 7013.183895]  6008fd18 00000001 60920000
[ 7013.187713]         607a0000 607a58d4 606eabac 7dc998ac 60923a44 600df91c 607d6200 726d55c0
[ 7013.196046]         68475d48 602f6d40 7dc998ac 00040900 00000000 00000000 00000000 00000000
[ 7013.204380]         ...
[ 7013.206813] Call Trace:
[ 7013.209280] [<6002d450>] show_stack+0x88/0xb8
[ 7013.213612] [<6024c39c>] dump_stack+0x8c/0xc0
[ 7013.217931] [<600470c4>] __warn+0x110/0x118
[ 7013.222097] [<6004710c>] warn_slowpath_fmt+0x40/0x64
[ 7013.227056] [<60552cb8>] mptcp_sock_def_error_report+0xec/0x13c
[ 7013.232949] [<604afd6c>] tcp_reset+0x60/0x84
[ 7013.237201] [<604b00cc>] tcp_validate_incoming+0x33c/0x4b8
[ 7013.241370] cdc_ether 2-1:2.0 wwan0: kevent 11 may have been dropped
[ 7013.249008] [<604b2ee0>] tcp_rcv_state_process+0x22c/0x8ec
[ 7013.254490] [<604bec68>] tcp_v4_do_rcv+0x2b4/0x2c4
[ 7013.259251] [<604bfa28>] tcp_v4_rcv+0xc4c/0x11a0
[ 7013.263856] [<60493678>] ip_local_deliver_finish+0x388/0x398
[ 7013.269493] [<60493de4>] ip_local_deliver+0x68/0x10c
[ 7013.274441] [<60494374>] ip_rcv+0x4ec/0x688
[ 7013.278626] [<6044c808>] __netif_receive_skb_core+0x760/0x980
[ 7013.284342] [<6044d354>] netif_receive_skb_internal+0xcc/0xe4
[ 7013.290090] [<60574e28>] br_pass_frame_up+0xf4/0x17c
[ 7013.295017] [<60575008>] br_handle_frame_finish+0x100/0x534
[ 7013.300572] [<605757b8>] br_handle_frame+0x37c/0x414
[ 7013.305369] cdc_ether 2-1:2.0 wwan0: kevent 11 may have been dropped
[ 7013.311857] [<6044c6ec>] __netif_receive_skb_core+0x644/0x980
[ 7013.317587] [<6044e640>] process_backlog+0x9c/0x164
[ 7013.322447] [<6044e3e4>] net_rx_action+0x158/0x318
[ 7013.327232] [<6004bd38>] __do_softirq+0x194/0x2d0
[ 7013.331909] [<6004beac>] run_ksoftirqd+0x38/0x6c
[ 7013.336523] [<60069794>] smpboot_thread_fn+0x1b4/0x1dc
[ 7013.341645] [<600657f4>] kthread+0xf8/0x100
[ 7013.345800] [<600274b8>] ret_from_kernel_thread+0x14/0x1c
[ 7013.351214] ---[ end trace 953e5dd640ff0e93 ]---
cpaasch commented 5 years ago

@lenormf - do you also have this error with either mptcp_v0.94, or the latest branch of mptcp_v0.93 ?

Having sporadic warnings à la mptcp_fallback_infinite or tcp_ack resetting flow is fine, because these are just indicating that the MPTCP-connection is falling back to regular TCP.

The bigger warning that you are getting shouldn't appear though.

lenormf commented 5 years ago

I have warnings with v0.93.1, I've also cherry-picked some bug-fixes over from v0.94.

matttbe commented 5 years ago

I've also cherry-picked some bug-fixes over from v0.94.

Are there fixes missing in the mptcp_v0.93 branch? I was looking at creating a new release for this branch which should contain all needed fixes but please tell me if it is not the case!

Do you have these warnings with the latest version of the mptcp_v0.93 branch as well?

lenormf commented 5 years ago

I mispoke, I cherry-picked commits from the development branch, which seemed to be important:

I haven't picked f2632fa4ee58f2d375e38119633b5739b6d43b2e mptcp: Use tcp_abort correctly for MPTCP however, so I should try with this commit first and get back to you.

matttbe commented 5 years ago

@lenormf May you try with mptcp_v0.93 branch? https://github.com/multipath-tcp/mptcp/tree/mptcp_v0.93 It also contains all these fixes, e.g. the last one you mention: https://github.com/multipath-tcp/mptcp/commit/678ebe0a81f997338a42b0a380bda88fae7682dc It also contains fixes from Linux upstream.

lenormf commented 5 years ago

I'm using v0.93.1 already.

matttbe commented 5 years ago

@lenormf yes but v0.93.1 is a tag created in January: https://github.com/multipath-tcp/mptcp/releases/tag/v0.93.1

A tag (v0.93.1) should not be modified while the branch (mptcp_v0.93) is not fixed and had been updated in between: https://github.com/multipath-tcp/mptcp/compare/v0.93.1...mptcp_v0.93

git checkout mptcp_v0.93
git pull
make (...)