multipath-tcp / mptcp_net-next

Development version of the Upstream MultiPath TCP Linux kernel 🐧
https://mptcp.dev
Other
290 stars 41 forks source link

mptcp vs net.ipv4.tcp_shrink_window #439

Open daire-byrne opened 1 year ago

daire-byrne commented 1 year ago

Splitting this out of #430 to be it's own thing. This is also another case like #437 where I tested a cloudflare patch to see if would help in our environment but it does not seem to play nice with mptcp. It is now part of v6.5 but not enabled by default.

I compiled v6.5.3 + mptcp fixes and set net.ipv4.tcp_shrink_window=1 and not only saw "hanging" rsync transfers but could also see extreme performance degradation with iperfs over the long fat network setup described in #437.

If I disable it again (the default) everything looks stable again. I'm not sure that the connection is completely hanging per say, just randomly going very slow:

net.ipv4.tcp_shrink_window = 1
serverA # for x in {1..10}; do mptcpize run iperf3 -O10 -t20 -c serverB -P1 | grep receiver;done
[  5]   0.00-20.07  sec  9.00 GBytes  3.85 Gbits/sec                  receiver
[  5]   0.00-20.08  sec  11.6 GBytes  4.98 Gbits/sec                  receiver
[  5]   0.00-20.09  sec  3.31 MBytes  1.38 Mbits/sec                  receiver  <<<< very bad
[  5]   0.00-20.07  sec  11.6 GBytes  4.95 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  5.10 GBytes  2.18 Gbits/sec                  receiver
[  5]   0.00-20.08  sec  1.71 GBytes   733 Mbits/sec                  receiver  <<<< pretty bad
[  5]   0.00-20.08  sec  3.03 GBytes  1.30 Gbits/sec                  receiver  <<<< not great
[  5]   0.00-20.07  sec  7.33 GBytes  3.14 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  11.4 GBytes  4.88 Gbits/sec                  receiver
[  5]   0.00-20.08  sec  10.7 GBytes  4.60 Gbits/sec                  receiver

net.ipv4.tcp_shrink_window = 0
serverA # or x in {1..100}; do mptcpize run iperf3 -O10 -t20 -c serverB -P1 | grep receiver;done
[  5]   0.00-20.07  sec  9.20 GBytes  3.94 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  8.79 GBytes  3.76 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  9.40 GBytes  4.02 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  7.83 GBytes  3.35 Gbits/sec                  receiver
[  5]   0.00-20.08  sec  7.30 GBytes  3.12 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  8.68 GBytes  3.71 Gbits/sec                  receiver
[  5]   0.00-20.08  sec  8.44 GBytes  3.61 Gbits/sec                  receiver
... <snip> all similar results <snip>
[  5]   0.00-20.07  sec  8.69 GBytes  3.72 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  8.59 GBytes  3.68 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  8.63 GBytes  3.70 Gbits/sec                  receiver
[  5]   0.00-20.07  sec  9.32 GBytes  3.99 Gbits/sec                  receiver

Basically there is almost a 1/10 chance that the iperf will stall or go slow when tcp_shrink_window = 1. This frequency is what kicked off my flurry of tickets starting with the mistaken assumption it was a "v6.4 regression" (#427).

And then because I was now looking more closely and quickly removed this tcp_shrink_window patch, we realised there were other low frequency and long standing mptcp "hangs" that had not been noticed before (now fixed).

With tcp_shrink_window=0, we see very consistent iperf results in the multi gbit range and never dropping down to the mbit range.

Also, if I run the iperf test over a single link using TCP with tcp_shrink_window=1, we don't see any performance degradation.

net.ipv4.tcp_shrink_window = 1
serverA # for x in {1..50}; do iperf3 -O10 -t20 -c serverB-ens256 -P1 | grep receiver; sleep 1; done 
[  5]   0.00-20.09  sec  5.54 GBytes  2.37 Gbits/sec                  receiver
[  5]   0.00-20.09  sec  6.73 GBytes  2.88 Gbits/sec                  receiver
[  5]   0.00-20.09  sec  5.95 GBytes  2.54 Gbits/sec                  receiver
[  5]   0.00-20.09  sec  6.77 GBytes  2.90 Gbits/sec                  receiver
..
..
[  5]   0.00-20.08  sec  6.71 GBytes  2.87 Gbits/sec                  receiver
[  5]   0.00-20.08  sec  6.14 GBytes  2.63 Gbits/sec                  receiver

We are fine to run with tcp_shrink_window=0 with mptcp connections from now on and this ticket is just an FYI.