Open matttbe opened 7 months ago
I don't know if this one is linked to it, but one MPTCP Join is also "regularly" failing:
# 112 userspace pm add & remove address
# syn [ ok ]
# synack [ ok ]
# ack [ ok ]
# add [ ok ]
# echo [ ok ]
# mptcp_info subflows=2:2 [ ok ]
# subflows_total 3:3 [ ok ]
# mptcp_info add_addr_signal=2:2 [ ok ]
# netlink error -3 (No such process)
# ./pm_nl_ctl: bailing out due to netlink error[s]
# rm [ ok ]
# rmsf [fail] got 1 RM_SUBFLOW[s] expected 2
# Server ns stats
# TcpPassiveOpens 3 0.0
# TcpEstabResets 1 0.0
# TcpInSegs 249 0.0
# TcpOutSegs 200 0.0
# TcpExtDelayedACKs 39 0.0
# TcpExtTCPPureAcks 116 0.0
# TcpExtTCPOrigDataSent 130 0.0
# TcpExtTCPDelivered 129 0.0
# MPTcpExtMPCapableSYNRX 1 0.0
# MPTcpExtMPCapableACKRX 1 0.0
# MPTcpExtMPJoinSynRx 2 0.0
# MPTcpExtMPJoinAckRx 2 0.0
# MPTcpExtEchoAdd 2 0.0
# MPTcpExtRmSubflow 1 0.0
# Client ns stats
# TcpActiveOpens 3 0.0
# TcpInSegs 206 0.0
# TcpOutSegs 253 0.0
# TcpOutRsts 1 0.0
# TcpExtTW 2 0.0
# TcpExtDelayedACKs 49 0.0
# TcpExtTCPPureAcks 69 0.0
# TcpExtTCPOrigDataSent 132 0.0
# TcpExtTCPDelivered 135 0.0
# MPTcpExtMPCapableSYNTX 1 0.0
# MPTcpExtMPCapableSYNACKRX 1 0.0
# MPTcpExtMPJoinSynAckRx 2 0.0
# MPTcpExtDuplicateData 1 0.0
# MPTcpExtAddAddr 2 0.0
# MPTcpExtRmAddr 2 0.0
# MPTcpExtRcvWndShared 2 0.0
# Info: invert
# mptcp_info subflows=0:0 [ ok ]
# subflows_total 1:1 [ ok ]
The counter TcpEstabResets
is also incremented here in case of error, maybe a link?
The pm_nl_ctl
failure is linked to this:
mptcp_info add_addr_signal=2:2 [ ok ]
+ ip netns exec ns1-660c230c-HNYZEv ./pm_nl_ctl rem token 420034353 id 10
+ set +x
+ ip netns exec ns1-660c230c-HNYZEv ./pm_nl_ctl dsf lip ::ffff:10.0.2.1 lport 10111 rip ::ffff:10.0.2.2 rport 47945 token 420034353
+ set +x
+ ip netns exec ns1-660c230c-HNYZEv ./pm_nl_ctl rem token 420034353 id 20
+ set +x
+ ip netns exec ns1-660c230c-HNYZEv ./pm_nl_ctl dsf lip 10.0.3.1 lport 10111 rip ::ffff:10.0.3.2 rport 56869 token 420034353
netlink error -3 (No such process)
./pm_nl_ctl: bailing out due to netlink error[s]
+ set +x
rm [ ok ]
rmsf [fail] got 1 RM_SUBFLOW[s] expected 2
So only the 4th command on the server side failed, because it didn't find the subflow for this connection (-3 error). Probably because the subflow has been reset?
If we look at the events, the subflow has been removed (MPTCP_EVENT_SUB_CLOSED = 11
):
+ ip netns exec ns1-660c34a1-ebiad5 ./pm_nl_ctl dsf lip ::ffff:10.0.2.1 lport 10111 rip ::ffff:10.0.2.2 rport 50643 token 226034464
type:11,token:226034464,family:10,saddr6:::ffff:10.0.2.1,daddr6:::ffff:10.0.2.2,sport:10111,dport:50643,loc_id:10,rem_id:1,backup:0
(...)
+ ip netns exec ns1-660c34a1-ebiad5 ./pm_nl_ctl dsf lip 10.0.3.1 lport 10111 rip ::ffff:10.0.3.2 rport 36365 token 226034464
netlink error -3 (No such process)
./pm_nl_ctl: bailing out due to netlink error[s]
type:11,token:226034464,family:10,saddr6:::ffff:10.0.3.1,daddr6:::ffff:10.0.3.2,sport:10111,dport:36365,loc_id:20,rem_id:2,backup:0
Here are two traces when having the last issue with mptcp_join
:
It is interesting to look at the 3rd stream (tcp.stream eq 2
):
# tcpdump -nr mp_join-112-ns1-ok-660c371d-EHjLjO.pcap tcp and port 57515
reading from file mp_join-112-ns1-ok-660c371d-EHjLjO.pcap, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 65535
Warning: interface names might be incorrect
18:49:36.661287 docker0 In IP 10.0.3.2.57515 > 10.0.3.1.10111: Flags [S], seq 2867335566, win 64256, options [mss 1460,sackOK,TS val 740440611 ecr 0,nop,wscale 7,mptcp 12 join id 2 token 0x4483f5d0 nonce 0x54a0aa8b], length 0
18:49:36.661298 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.57515: Flags [S.], seq 4277192773, ack 2867335567, win 65160, options [mss 1460,sackOK,TS val 1897770162 ecr 740440611,nop,wscale 7,mptcp 16 join id 20 hmac 0xfa7ce90421245e94 nonce 0xfdf38d4e], length 0
18:49:36.661313 docker0 In IP 10.0.3.2.57515 > 10.0.3.1.10111: Flags [.], ack 1, win 502, options [nop,nop,TS val 740440611 ecr 1897770162,mptcp 24 join hmac 0xb601d40e5ec37a2f295fbfbc4989cd13d39a07fa], length 0
18:49:36.661331 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.57515: Flags [.], ack 1, win 510, options [nop,nop,TS val 1897770162 ecr 740440611,mptcp 12 dss ack 13289830844751537108], length 0
18:49:39.333176 docker0 In IP 10.0.3.2.57515 > 10.0.3.1.10111: Flags [.], ack 1, win 502, options [nop,nop,TS val 740440611 ecr 1897770162,mptcp 12 dss ack 3640936710867625463], length 0
18:49:39.333197 docker0 In IP 10.0.3.2.57515 > 10.0.3.1.10111: Flags [F.], seq 1, ack 1, win 502, options [nop,nop,TS val 740443283 ecr 1897770162,mptcp 12 dss ack 3640936710867625463], length 0
18:49:39.334593 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.57515: Flags [.], ack 2, win 510, options [nop,nop,TS val 1897772836 ecr 740443283,mptcp 12 dss ack 13289830844751537173], length 0
18:49:40.666604 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.57515: Flags [.], ack 2, win 510, options [nop,nop,TS val 1897772836 ecr 740443283,mptcp 12 dss ack 13289830844751537208], length 0
18:49:40.666631 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.57515: Flags [F.], seq 1, ack 2, win 510, options [nop,nop,TS val 1897774168 ecr 740443283,mptcp 12 dss ack 13289830844751537208], length 0
18:49:40.666640 docker0 In IP 10.0.3.2.57515 > 10.0.3.1.10111: Flags [.], ack 2, win 502, options [nop,nop,TS val 740444617 ecr 1897774168], length 0
# tcpdump -nr mp_join-112-ns1-nok-660c372c-KD5wmI.pcap tcp and port 52717
reading from file mp_join-112-ns1-nok-660c372c-KD5wmI.pcap, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 65535
Warning: interface names might be incorrect
18:49:50.933196 docker0 In IP 10.0.3.2.52717 > 10.0.3.1.10111: Flags [S], seq 1030980353, win 64256, options [mss 1460,sackOK,TS val 740454883 ecr 0,nop,wscale 7,mptcp 12 join id 2 token 0x3d144ed6 nonce 0x7743f7a4], length 0
18:49:50.933219 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.52717: Flags [S.], seq 2375628578, ack 1030980354, win 65160, options [mss 1460,sackOK,TS val 1897784434 ecr 740454883,nop,wscale 7,mptcp 16 join id 20 hmac 0x7fe4b626c5cd422c nonce 0xca2ce698], length 0
18:49:50.933245 docker0 In IP 10.0.3.2.52717 > 10.0.3.1.10111: Flags [.], ack 1, win 502, options [nop,nop,TS val 740454883 ecr 1897784434,mptcp 24 join hmac 0xc7a0c08aeed8af905201939a58a0b39356907dc2], length 0
18:49:50.933272 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.52717: Flags [.], ack 1, win 510, options [nop,nop,TS val 1897784434 ecr 740454883,mptcp 12 dss ack 3014908832125989087], length 0
18:49:53.548318 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.52717: Flags [P.], seq 1:6, ack 1, win 510, options [nop,nop,TS val 1897787049 ecr 740454883,mptcp 26 dss ack 3014908832125989152 seq 7370911507231076069 subseq 1 len 5,nop,nop], length 5
18:49:53.548354 docker0 In IP 10.0.3.2.52717 > 10.0.3.1.10111: Flags [.], ack 6, win 502, options [nop,nop,TS val 740457498 ecr 1897787049,mptcp 12 dss ack 7370911507231076074], length 0
18:49:53.617220 docker0 In IP 10.0.3.2.52717 > 10.0.3.1.10111: Flags [.], ack 6, win 502, options [nop,nop,TS val 740457498 ecr 1897787049,mptcp 12 dss ack 7370911507231076074], length 0
18:49:53.617238 docker0 In IP 10.0.3.2.52717 > 10.0.3.1.10111: Flags [F.], seq 1, ack 6, win 502, options [nop,nop,TS val 740457567 ecr 1897787049,mptcp 12 dss ack 7370911507231076074], length 0
18:49:53.617620 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.52717: Flags [.], ack 2, win 510, options [nop,nop,TS val 1897787119 ecr 740457567,mptcp 12 dss ack 3014908832125989152], length 0
18:49:53.718637 docker0 Out IP 10.0.3.1.10111 > 10.0.3.2.52717: Flags [P.], seq 6:11, ack 2, win 510, options [nop,nop,TS val 1897787220 ecr 740457567,mptcp 26 dss ack 3014908832125989152 seq 7370911507231076074 subseq 6 len 5,nop,nop], length 5
18:49:53.718673 docker0 In IP 10.0.3.2.52717 > 10.0.3.1.10111: Flags [R], seq 1030980355, win 0, length 0
When there is no issue, there is no data, just the 3WHS and the closure. When there is an issue, one end sends a packet with data just after having received the FIN
→ consequence: RST
then.
nkambo from LKFT reported that
mptcp_connect.sh
subtests are, e.g.From: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.83-273-gaa2042702765/testrun/23261389/suite/kselftest-net-mptcp/test/net_mptcp_mptcp_connect_sh_-_mptcp_connect_peek_mode_saveAfterPeek_ns1_MPTCP__ns1_dead_beef_1_1_10049_MPTCP/log
Or
From: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.83-273-gaa2042702765/testrun/23259658/suite/kselftest-net-mptcp/test/net_mptcp_mptcp_connect_sh_-_mptcp_connect_full_disconnect_ns1_MPTCP__ns1_dead_beef_1_1_20004_MPTCP/log
Looking at the history, the
mptcp_connect.sh
test seems quite unstable: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.83-273-gaa2042702765/testrun/23259658/suite/kselftest-net-mptcp/test/net_mptcp_mptcp_connect_sh/history/