`simult_flows`: unbalanced bwidth tests are unstable

matttbe commented 9 months ago

Mainly according to NIPA (history normal - debug), this test is unstable. It looks like we can see the instability with and without unbalanced delay, but mainly with.

In NIPA, a lot of tests are executed in parallel. In theory, it should be fine, most selftests don't use all available resources. Still, even if the issue is happening in the non debug mode, this could justify some instabilities.

https://github.com/multipath-tcp/mptcp_net-next/commit/101192474e073b4059f98ddcb81b08d762b8be5e selftests: mptcp: simult flows: re-adapt BW patch in our tree could improve the situation, but it has not been properly measured.

It would be good to:

check if the estimations done in the scripts are OK
check some counters that could explain extra delays -- losses (especially at the beginning/end), retransmissions, etc. -- and add some slacks
take into account the extra delay to establish the connections? (but the issue can be seen without extra delays)
maybe a pending fix could improve the situation?

matttbe commented 8 months ago

I did some tests on my side. I had to use the non-debug mode (otherwise we already increase the tolerance), without KVM.

It looks like the patch from our tree (selftests: mptcp: simult flows: re-adapt BW) helps. When I run the test 100 times:

without the patch, I can reproduce the issue (without KVM) around 10% of the time.
with the patch, I only had the issue 2 times.

I tried to get more info in case of issues (without the patch). When there is an issue:

It looks like the traffic dropped a bit in the middle:
During this transfer, the MPTcpExtDuplicateData on the client side is 8, and MPTcpExtMPTCPRetrans is at 8 on the server side.

(that was for the test using the port 10002)

unbalanced bwidth - reverse direction                       transfer slower than expected! runtime 12033 ms, expected 11921 ms max 11921       [ fail ]

If it can be normal to have these counters > 0, maybe we can increase the max value? A technical issue in this test is that we give the timeout to mptcp_connect (the binary)

If not, here are the packet captures:

when there is an error: capture-error.zip (nstat-error.zip)
the same test, when there is no error: capture-success.zip (nstat-success.zip)

Here are the MIB counters:

Server side:

Counters	OK	Err	Diff
IpInReceives	1585	867	-718
IpInDelivers	1585	867	-718
IpOutRequests	1654	933	-721
IpOutTransmits	1654	933	-721
TcpActiveOpens	2	2	0
TcpInSegs	1585	867	-718
TcpOutSegs	12151	12027	-124
Ip6OutRequests	1	1	0
Ip6OutMcastPkts	1	1	0
Ip6OutOctets	56	56	0
Ip6OutMcastOctets	56	56	0
Ip6OutTransmits	1	1	0
Icmp6OutMsgs	1	1	0
Icmp6OutRouterSolicits	1	1	0
Icmp6OutType133	1	1	0
TcpExtTW	2	2	0
TcpExtDelayedACKLost	1	0	-1
TcpExtTCPPureAcks	1556	838	-718
TcpExtTCPLossProbes	2	2	0
TcpExtTCPDSACKOldSent	1	0	-1
TcpExtTCPSpuriousRtxHostQueues	1	1	0
TcpExtTCPOrigDataSent	12128	12004	-124
TcpExtTCPHystartTrainDetect	1	0	-1
TcpExtTCPHystartTrainCwnd	171	0	-171
TcpExtTCPHystartDelayDetect	1	2	1
TcpExtTCPHystartDelayCwnd	222	724	502
TcpExtTCPDelivered	12130	12006	-124
IpExtInOctets	198548	137920	-60628
IpExtOutOctets	16909204	16947976	38772
IpExtInNoECTPkts	1634	906	-728
MPTcpExtMPCapableSYNTX	1	1	0
MPTcpExtMPCapableSYNACKRX	1	1	0
MPTcpExtMPTCPRetrans	0	8	8
MPTcpExtMPJoinSynAckRx	1	1	0
MPTcpExtOFOQueueTail	9	8	-1
MPTcpExtOFOQueue	11	9	-2
MPTcpExtOFOMerge	8	7	-1
MPTcpExtDuplicateData	1	0	-1
MPTcpExtSndWndShared	1	0	-1
MPTcpExtRcvWndShared	11	7	-4
Time (ms) (limit : 11921)	11344	12033	689

Client side:

Counters	Success	Error	Diff
IpInReceives	1654	933	-721
IpInDelivers	1654	933	-721
IpOutRequests	1585	867	-718
IpOutTransmits	1585	867	-718
TcpPassiveOpens	2	2	0
TcpInSegs	1654	933	-721
TcpOutSegs	1633	906	-727
TcpRetransSegs	1	0	-1
Ip6OutRequests	1	0	-1
Ip6OutMcastPkts	1	0	-1
Ip6OutOctets	56	0	-56
Ip6OutMcastOctets	56	0	-56
Ip6OutTransmits	1	0	-1
Icmp6OutMsgs	1	0	-1
Icmp6OutRouterSolicits	1	0	-1
Icmp6OutType133	1	0	-1
TcpExtPAWSEstab	1	0	-1
TcpExtTW	0	1	1
TcpExtDelayedACKs	7	6	-1
TcpExtTCPPureAcks	20	21	1
TcpExtTCPLossProbes	3	4	1
TcpExtTCPDSACKRecv	1	0	-1
TcpExtTCPOrigDataSent	75	66	-9
TcpExtTCPDelivered	76	66	-10
TcpExtTCPDSACKRecvSegs	1	0	-1
IpExtInOctets	16909204	16947976	38772
IpExtOutOctets	198548	137920	-60628
IpExtInNoECTPkts	12151	12027	-124
MPTcpExtMPCapableSYNRX	1	1	0
MPTcpExtMPCapableACKRX	1	1	0
MPTcpExtMPTCPRetrans	1	0	-1
MPTcpExtMPJoinSynRx	1	1	0
MPTcpExtMPJoinAckRx	1	1	0
MPTcpExtOFOQueueTail	849	492	-357
MPTcpExtOFOQueue	868	504	-364
MPTcpExtOFOMerge	503	405	-98
MPTcpExtDuplicateData		8	8
MPTcpExtRcvWndShared	568	362	-206
MPTcpExtRcvWndConflictUpdate	1		-1
Time (ms) (limit : 11921)	11344	12033	689

(the TcpIn/OutSegs are interesting: not the same number of the client/server: issue? GRO?)

Maybe a first step is to send the patch we have in our tree, because it looks like it is already helping? Or we want a better fix with the new BW limits? cc: @pabeni

matttbe commented 8 months ago

Note: the same graph as last time, but for the "success" run: (cc: @mjmartineau )

And all the captures I had: simult_flows_analyses.zip (cc: @pabeni )

"fail": only the 2nd test failed
"success": all tests were successful

mjmartineau commented 8 months ago

Thanks for the success chart. In the failure graph it's interesting that the segment lengths settle in to three main groups around the 6 second mark until the throughput drops around 6.75s:

~34000
~30000
<1000

Not sure what it means yet. The success graph maintains the more randomized pattern without the "ceiling" around 34000 for segment length.

matttbe commented 5 months ago

FYI, we recently had an issue with unbalanced bwidth with opposed, unbalanced delay test as well, not just with unbalanced bwidth:

# 05 unbalanced bwidth                                           11365 max 11921      [ OK ]
# 06 unbalanced bwidth - reverse direction                       11275 max 11921      [ OK ]
# 07 unbalanced bwidth with unbalanced delay                     11357 max 11921      [ OK ]
# 08 unbalanced bwidth with unbalanced delay - reverse direction 11358 max 11921      [ OK ]
# 09 unbalanced bwidth with opposed, unbalanced delay            transfer slower than expected! runtime 11936 ms, expected 11921 ms max 11921      [FAIL]
# client exit code 1, server 0
# 
# netns ns3-66599cb3-pXhHQv socket stat for 10009:
# State     Recv-Q Send-Q Local Address:Port  Peer Address:Port                            
# TIME-WAIT 0      0           10.0.3.3:10009     10.0.1.1:43328 timer:(timewait,59sec,0)
#   
# TIME-WAIT 0      0           10.0.3.3:10009     10.0.2.1:54577 timer:(timewait,59sec,0)
#   
# 
# netns ns1-66599cb3-pXhHQv socket stat for 10009:
# State     Recv-Q Send-Q    Local Address:Port  Peer Address:Port                            
# TIME-WAIT 0      0      10.0.2.1%ns1eth2:54577     10.0.3.3:10009 timer:(timewait,59sec,0)
#   
# TIME-WAIT 0      0              10.0.1.1:43328     10.0.3.3:10009 timer:(timewait,59sec,0)
#   
# -rw------- 1 root root 16777216 May 31 09:49 /tmp/tmp.Kgg7TsylGy
# -rw------- 1 root root 16777216 May 31 09:47 /tmp/tmp.uLdbZsVBoE
# -rw------- 1 root root 81920 May 31 09:47 /tmp/tmp.GFhlAAvlgr
# -rw------- 1 root root 81920 May 31 09:48 /tmp/tmp.yewEKG7Mia
# 10 unbalanced bwidth with opposed, unbalanced delay - reverse direction11682 max 11921      [ OK ]

https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9316053877

matttbe commented 4 months ago

The netdev CI also got a similar issue recently (in non-debug):

# 09 unbalanced bwidth with opposed, unbalanced delay            transfer slower than expected! runtime 11933 ms, expected 11921 ms max 11921      [FAIL]
# client exit code 1, server 0
# 
# netns ns3-VPWDdZ socket stat for 10009:
# State     Recv-Q Send-Q Local Address:Port  Peer Address:Port                            
# TIME-WAIT 0      0           10.0.3.3:10009     10.0.2.1:49301 timer:(timewait,59sec,0)
# 
# TIME-WAIT 0      0           10.0.3.3:10009     10.0.1.1:60652 timer:(timewait,59sec,0)
# 
# 
# netns ns1-msD7GR socket stat for 10009:
# State     Recv-Q Send-Q    Local Address:Port  Peer Address:Port                            
# TIME-WAIT 0      0      10.0.2.1%ns1eth2:49301     10.0.3.3:10009 timer:(timewait,59sec,0)
# 
# TIME-WAIT 0      0              10.0.1.1:60652     10.0.3.3:10009 timer:(timewait,59sec,0)
# 
# -rw------- 1 root root 16777216 Jul  7 11:30 /tmp/tmp.FbUJhacz2Q
# -rw------- 1 root root 16777216 Jul  7 11:28 /tmp/tmp.wwsjKhZcCg
# -rw------- 1 root root 81920 Jul  7 11:30 /tmp/tmp.UktnqZBu4h
# -rw------- 1 root root 81920 Jul  7 11:28 /tmp/tmp.yWNVMvOkTI
# 10 unbalanced bwidth with opposed, unbalanced delay - reverse direction11712 max 11921      [ OK ]

https://netdev-3.bots.linux.dev/vmksft-mptcp/results/672921/3-simult-flows-sh/stdout

multipath-tcp / mptcp_net-next

`simult_flows`: unbalanced bwidth tests are unstable #475