MPTCP with 1 subflow throughput much lower than Linux TCP

alex1230608 commented 4 years ago

I am using mptcp v0.90. If this issue is expected to disappear, please close this one. I am currently installing the v0.95 version, and will report back if the following is still a problem.

I have noticed several ways to increase the performance of mptcp from other threads, such as disable checksum and try cubic congestion control. However, the throughput of mptcp with 1 subflow is still substantially lower than linux tcp (see results below). If the iperf result is trustworthy, it seems that with mptcp enabled, the cwnd cannot go beyond 14KB and there is no retransmission (i.e., no network congestion). Is the mptcp using a different sending/receiving buffer sizes which limit the max congestion window? Or does anyone know any other possible reasons?

With mptcp disabled net.ipv4.tcp_congestion_control=cubic

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.65 GBytes  14.2 Gbits/sec   49    372 KBytes       
[  4]   1.00-2.00   sec  1.71 GBytes  14.7 Gbits/sec   20    359 KBytes       
[  4]   2.00-3.00   sec  1.69 GBytes  14.5 Gbits/sec   36    358 KBytes       
[  4]   3.00-4.00   sec  2.05 GBytes  17.6 Gbits/sec    0    471 KBytes       
[  4]   4.00-5.00   sec  2.06 GBytes  17.7 Gbits/sec   58    427 KBytes       
[  4]   5.00-6.00   sec  1.93 GBytes  16.5 Gbits/sec   14    403 KBytes       
[  4]   6.00-7.00   sec  1.84 GBytes  15.8 Gbits/sec   37    383 KBytes       
[  4]   7.00-8.00   sec  1.76 GBytes  15.1 Gbits/sec   47    373 KBytes       
[  4]   8.00-9.00   sec  1.73 GBytes  14.9 Gbits/sec   32    366 KBytes       
[  4]   9.00-10.00  sec  1.70 GBytes  14.6 Gbits/sec   37    362 KBytes       
[  4]  10.00-11.00  sec  1.68 GBytes  14.4 Gbits/sec   45    358 KBytes

With mptcp net.mptcp.mptcp_checksum=0 net.mptcp.mptcp_syn_retries=3 net.ipv4.tcp_congestion_control=cubic net.mptcp.mptcp_path_manager=ndiffports net.mptcp.mptcp_scheduler=default echo 1 > /sys/module/mptcp_ndiffports/parameters/num_subflows

Result:

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   297 MBytes  2.49 Gbits/sec    0   14.1 KBytes       
[  4]   1.00-2.00   sec   297 MBytes  2.49 Gbits/sec    0   14.1 KBytes       
[  4]   2.00-3.00   sec   286 MBytes  2.40 Gbits/sec    0   14.1 KBytes       
[  4]   3.00-4.00   sec   280 MBytes  2.35 Gbits/sec    0   14.1 KBytes       
[  4]   4.00-5.00   sec   296 MBytes  2.49 Gbits/sec    0   14.1 KBytes       
[  4]   5.00-6.00   sec   292 MBytes  2.45 Gbits/sec    0   14.1 KBytes       
[  4]   6.00-7.00   sec   313 MBytes  2.63 Gbits/sec    0   14.1 KBytes       
[  4]   7.00-8.00   sec   318 MBytes  2.67 Gbits/sec    0   14.1 KBytes       
[  4]   8.00-9.00   sec   303 MBytes  2.55 Gbits/sec    0   14.1 KBytes       
[  4]   9.00-10.00  sec   319 MBytes  2.67 Gbits/sec    0   14.1 KBytes       
[  4]  10.00-11.00  sec   301 MBytes  2.52 Gbits/sec    0   14.1 KBytes       
[  4]  11.00-12.00  sec   314 MBytes  2.63 Gbits/sec    0   14.1 KBytes       
[  4]  12.00-13.00  sec   314 MBytes  2.63 Gbits/sec    0   14.1 KBytes       
[  4]  13.00-14.00  sec   306 MBytes  2.57 Gbits/sec    0   14.1 KBytes       
[  4]  14.00-15.00  sec   302 MBytes  2.53 Gbits/sec    0   14.1 KBytes       
[  4]  15.00-16.00  sec   296 MBytes  2.48 Gbits/sec    0   14.1 KBytes       
[  4]  16.00-17.00  sec   303 MBytes  2.54 Gbits/sec    0   14.1 KBytes

cpaasch commented 4 years ago

I assume that this is with v0.90, right? Please try with v0.95 instead.

alex1230608 commented 4 years ago

I just tried v0.95. The result is at the end (the network condition changed a little bit, please don't compare the result from the first post and this one)

The throughput of mptcp seems fine if cubic is used. However, I wonder whether it is normal for lia to have lower throughput, and want to ask a high-level question: how to choose between these congestion controls, especially in a DCN scenario? As far as I know from the paper, lia can keep the fairness, and cubic will be almost like creating multiple sockets, which should still be fine if all end servers (in a DCN scenario) are using mptcp + cubic. Am I right?

I also conducted other experiments to measure the throughput vesus time. It turns out mptcp+lia needs a very long time (> 4sec even when RTT < 200us) to climb to the max throughput and the max is too much lower than the link bandwidth (15Gbps vs. 25Gbps). On the other hand, when I use mptcp+reno, it reaches max throughput in milliseconds, but still less than link bandwidth too much (18Gbps vs. 25Gbps). As a baseline, the linux tcp can reach 23Gbps in milliseconds.

with mptcp disabled

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.69 GBytes  23.1 Gbits/sec  310   1.04 MBytes       
[  4]   1.00-2.00   sec  2.73 GBytes  23.4 Gbits/sec   71   1.12 MBytes       
[  4]   2.00-3.00   sec  2.72 GBytes  23.3 Gbits/sec  188   1.13 MBytes       
[  4]   3.00-4.00   sec  2.72 GBytes  23.3 Gbits/sec  202   1.11 MBytes       
[  4]   4.00-5.00   sec  2.72 GBytes  23.3 Gbits/sec  103   1.10 MBytes       
[  4]   5.00-6.00   sec  2.71 GBytes  23.3 Gbits/sec  245   1.15 MBytes       
[  4]   6.00-7.00   sec  2.71 GBytes  23.3 Gbits/sec   91   1.11 MBytes       
[  4]   7.00-8.00   sec  2.71 GBytes  23.2 Gbits/sec  242   1.09 MBytes       
[  4]   8.00-9.00   sec  2.72 GBytes  23.4 Gbits/sec  168   1.10 MBytes       
[  4]   9.00-10.00  sec  2.73 GBytes  23.4 Gbits/sec   62   1.15 MBytes       
[  4]  10.00-11.00  sec  2.72 GBytes  23.3 Gbits/sec   97    861 KBytes

with mptcp enabled and using congestion control lia

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   532 MBytes  4.46 Gbits/sec    0   14.1 KBytes       
[  4]   1.00-2.00   sec  1.40 GBytes  12.0 Gbits/sec    0   14.1 KBytes       
[  4]   2.00-3.00   sec  1.77 GBytes  15.2 Gbits/sec    0   14.1 KBytes       
[  4]   3.00-4.00   sec  1.50 GBytes  12.9 Gbits/sec    0   14.1 KBytes       
[  4]   4.00-5.00   sec  1.55 GBytes  13.3 Gbits/sec    0   14.1 KBytes       
[  4]   5.00-6.00   sec  1.86 GBytes  16.0 Gbits/sec    0   14.1 KBytes       
[  4]   6.00-7.00   sec  1.55 GBytes  13.3 Gbits/sec    0   14.1 KBytes       
[  4]   7.00-8.00   sec  1.35 GBytes  11.6 Gbits/sec    0   14.1 KBytes       
[  4]   8.00-9.00   sec  1.45 GBytes  12.5 Gbits/sec    0   14.1 KBytes       
[  4]   9.00-10.00  sec  1.81 GBytes  15.6 Gbits/sec    0   14.1 KBytes       
[  4]  10.00-11.00  sec  1.49 GBytes  12.8 Gbits/sec    0   14.1 KBytes       
[  4]  11.00-12.00  sec  1.30 GBytes  11.2 Gbits/sec    0   14.1 KBytes       
[  4]  12.00-13.00  sec  1.21 GBytes  10.4 Gbits/sec    0   14.1 KBytes       
[  4]  13.00-14.00  sec  1.34 GBytes  11.5 Gbits/sec    0   14.1 KBytes       
[  4]  14.00-15.00  sec  1.72 GBytes  14.8 Gbits/sec    0   14.1 KBytes

with mptcp enabled and using congestion control cubic

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.63 GBytes  22.6 Gbits/sec    0   14.1 KBytes       
[  4]   1.00-2.00   sec  2.64 GBytes  22.7 Gbits/sec    0   14.1 KBytes       
[  4]   2.00-3.00   sec  2.65 GBytes  22.8 Gbits/sec    0   14.1 KBytes       
[  4]   3.00-4.00   sec  2.63 GBytes  22.6 Gbits/sec    0   14.1 KBytes       
[  4]   4.00-5.00   sec  2.66 GBytes  22.9 Gbits/sec    0   14.1 KBytes       
[  4]   5.00-6.00   sec  2.65 GBytes  22.8 Gbits/sec    0   14.1 KBytes       
[  4]   6.00-7.00   sec  2.67 GBytes  22.9 Gbits/sec    0   14.1 KBytes       
[  4]   7.00-8.00   sec  2.65 GBytes  22.8 Gbits/sec    0   14.1 KBytes       
[  4]   8.00-9.00   sec  2.66 GBytes  22.9 Gbits/sec    0   14.1 KBytes       
[  4]   9.00-10.00  sec  2.68 GBytes  23.1 Gbits/sec    0   14.1 KBytes       
[  4]  10.00-11.00  sec  2.64 GBytes  22.7 Gbits/sec    0   14.1 KBytes       
[  4]  11.00-12.00  sec  2.68 GBytes  23.0 Gbits/sec    0   14.1 KBytes

alex1230608 commented 4 years ago

Sorry. I accidentally closed the issue..

alex1230608 commented 4 years ago

After applying the core affinity setting in a way similar to the following link, I can achieve 25Gbps with single connection with arbitrary number of subflows and any congestion control. Thank you! The key is to assign all IRQ, RFS for the used NICs to cores in the same CPU socket, in order to avoid cache misses.

http://multipath-tcp.org/pmwiki.php?n=Main.50Gbps

cpaasch commented 4 years ago

Oh, you have several NUMA-nodes? Yeah, if it goes on a different NUMA the perf will go down the drain...

multipath-tcp / mptcp

MPTCP with 1 subflow throughput much lower than Linux TCP #378