Splitting this out of #430 to be it's own thing. This is also another case like #437 where I tested a cloudflare patch to see if would help in our environment but it does not seem to play nice with mptcp. It is now part of v6.5 but not enabled by default.
I compiled v6.5.3 + mptcp fixes and set net.ipv4.tcp_shrink_window=1 and not only saw "hanging" rsync transfers but could also see extreme performance degradation with iperfs over the long fat network setup described in #437.
If I disable it again (the default) everything looks stable again. I'm not sure that the connection is completely hanging per say, just randomly going very slow:
Basically there is almost a 1/10 chance that the iperf will stall or go slow when tcp_shrink_window = 1. This frequency is what kicked off my flurry of tickets starting with the mistaken assumption it was a "v6.4 regression" (#427).
And then because I was now looking more closely and quickly removed this tcp_shrink_window patch, we realised there were other low frequency and long standing mptcp "hangs" that had not been noticed before (now fixed).
With tcp_shrink_window=0, we see very consistent iperf results in the multi gbit range and never dropping down to the mbit range.
Also, if I run the iperf test over a single link using TCP with tcp_shrink_window=1, we don't see any performance degradation.
Splitting this out of #430 to be it's own thing. This is also another case like #437 where I tested a cloudflare patch to see if would help in our environment but it does not seem to play nice with mptcp. It is now part of v6.5 but not enabled by default.
I compiled v6.5.3 + mptcp fixes and set net.ipv4.tcp_shrink_window=1 and not only saw "hanging" rsync transfers but could also see extreme performance degradation with iperfs over the long fat network setup described in #437.
If I disable it again (the default) everything looks stable again. I'm not sure that the connection is completely hanging per say, just randomly going very slow:
Basically there is almost a 1/10 chance that the iperf will stall or go slow when tcp_shrink_window = 1. This frequency is what kicked off my flurry of tickets starting with the mistaken assumption it was a "v6.4 regression" (#427).
And then because I was now looking more closely and quickly removed this tcp_shrink_window patch, we realised there were other low frequency and long standing mptcp "hangs" that had not been noticed before (now fixed).
With tcp_shrink_window=0, we see very consistent iperf results in the multi gbit range and never dropping down to the mbit range.
Also, if I run the iperf test over a single link using TCP with tcp_shrink_window=1, we don't see any performance degradation.
We are fine to run with tcp_shrink_window=0 with mptcp connections from now on and this ticket is just an FYI.