multipath-tcp / mptcp

⚠️⚠️⚠️ Deprecated 🚫 Out-of-tree Linux Kernel implementation of MultiPath TCP. 👉 Use https://github.com/multipath-tcp/mptcp_net-next repo instead ⚠️⚠️⚠️
https://github.com/multipath-tcp/mptcp_net-next
Other
890 stars 336 forks source link

aggregation with shadowsocks doesn't work with 'single' connection #502

Open SriramScorp opened 1 year ago

SriramScorp commented 1 year ago

Trying to use shadowsocks-libev v3.3.5 for aggregating multiple WANs. Created per-interface entries in tables 'ip rule' and 'ip route'. ss-server is running in 64-bit debian-10 vps. ss-redir is running in 32-bit raspi-os, configured to work as a router.

While running ookla speedtest or upload/download files from router-side LAN system, all WANs are used only occasionally. Ookla speedtest with 'Multi' connection uses all WAN whereas 'Single' connection uses only one of the 4 WANs randomly. Downloading or uploading videos to YouTube always seem to use only a single WAN.

I cannot figure out if the issue is from the mptcp-capable kernel not doing its job correctly or if its something from the shadowsocks-side.

Client-side info:

kernel.osrelease = 5.4.83-MPTCP+
net.mptcp.mptcp_checksum = 0
net.mptcp.mptcp_debug = 0
net.mptcp.mptcp_enabled = 1
net.mptcp.mptcp_path_manager = fullmesh
net.mptcp.mptcp_scheduler = default
net.mptcp.mptcp_syn_retries = 3
net.mptcp.mptcp_version = 0
net.ipv4.tcp_congestion_control = cubic

$ cat /sys/module/mptcp_fullmesh/parameters/num_subflows
1
$ cat /sys/module/mptcp_fullmesh/parameters/create_on_err
0

$ ip rule list
0:      from all lookup local
32758:  from 192.168.4.104 lookup eth4
32760:  from 192.168.3.103 lookup eth3
32762:  from 192.168.2.102 lookup eth2
32764:  from 192.168.1.101 lookup eth1
32766:  from all lookup main
32767:  from all lookup default

$ ip route list table all
default via 192.168.1.1 dev eth1 table eth1 
192.168.1.0/24 dev eth1 table eth1 scope link 
default via 192.168.2.1 dev eth2 table eth2 
192.168.2.0/24 dev eth2 table eth2 scope link 
default via 192.168.3.1 dev eth3 table eth3 
192.168.3.0/24 dev eth3 table eth3 scope link 
default via 192.168.4.1 dev eth4 table eth4 
192.168.4.0/24 dev eth4 table eth4 scope link 
...

[ 6868.657193] mptcp_alloc_mpcb: created mpcb with token 0x13cd5029
[ 6868.657214] mptcp_add_sock: token 0x13cd5029 pi 1, src_addr:192.168.3.103:39696 dst_addr:1.2.3.4:8388
[ 6868.714911] mptcp_add_sock: token 0x13cd5029 pi 2, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 6868.714951] __mptcp_init4_subsockets: token 0x13cd5029 pi 2 src_addr:192.168.2.102:0 dst_addr:1.2.3.4:8388 ifidx: 4
[ 6868.715236] mptcp_add_sock: token 0x13cd5029 pi 3, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 6868.715270] __mptcp_init4_subsockets: token 0x13cd5029 pi 3 src_addr:192.168.1.101:0 dst_addr:1.2.3.4:8388 ifidx: 6
[ 6868.715436] mptcp_add_sock: token 0x13cd5029 pi 4, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 6868.715465] __mptcp_init4_subsockets: token 0x13cd5029 pi 4 src_addr:192.168.4.104:0 dst_addr:1.2.3.4:8388 ifidx: 5
[ 6869.595243] mptcp_close: Close of meta_sk with tok 0x13cd5029
[ 6869.651383] mptcp_del_sock: Removing subsock tok 0x13cd5029 pi:3 state 7 is_meta? 0
[ 6869.696388] mptcp_del_sock: Removing subsock tok 0x13cd5029 pi:1 state 7 is_meta? 0
[ 6869.733413] mptcp_del_sock: Removing subsock tok 0x13cd5029 pi:2 state 7 is_meta? 0
[ 6869.811350] mptcp_del_sock: Removing subsock tok 0x13cd5029 pi:4 state 7 is_meta? 0
[ 6869.811375] mptcp_sock_destruct destroying meta-sk token 0x13cd5029
matttbe commented 1 year ago

Hello,

kernel.osrelease = 5.4.83-MPTCP+

That's quite old. Did you get all the recent fixes from MPTCP (and elsewhere in the kernel). The last version is on v5.4.230.

Other than that, with the info you provided, everything seems OK. It would be good to get info from /proc/net/mptcp_net/mptcp (and snmp) when a transfer is in progress.

Eventually also from /proc/net/mptcp_fullmesh to check if the PM is OK (I guess it is).

Packet traces can help too.

If you do have multiple subflows but no traffic on some of them while you should (and you didn't set the "backup" flag on the interfaces), it is important to see the view of the sender. Many reasons can explain why the transfer is limited, e.g. CPU resources, network buffers, bufferbloat, middleboxes, bugs, etc. Monitoring resources and capture packets traces can help.

SriramScorp commented 1 year ago

That's quite old.

Well, the image was built a couple years ago based on Raspbian Buster. I believe that should not be the reason I'm facing this issue.

$ cat /proc/net/mptcp_net/mptcp (decoded)
sl   loc_tok   rem_tok   v6  local_address        remote_address  st  ns  tx_queue:rx_queue  inode
0:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:40112  1.2.3.4:8388    01  04  00000000:00000000  3632263
1:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:9150   1.2.3.4:8388    01  04  00000000:00000000  3463736
2:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:40048  1.2.3.4:8388    01  03  00000000:00000000  3630169
3:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:40044  1.2.3.4:8388    01  04  00000000:00000000  3630119
4:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:40104  1.2.3.4:8388    01  04  00000000:00000000  3632260
5:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:40042  1.2.3.4:8388    01  04  00000000:00000000  3630117
6:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:40036  1.2.3.4:8388    01  04  00000000:00000000  3630084
7:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:39752  1.2.3.4:8388    01  04  00000000:00000000  3612157
8:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:40130  1.2.3.4:8388    01  04  00000000:00000000  3638341
9:   FFFFFFFF  FFFFFFFF  0   192.168.2.102:40074  1.2.3.4:8388    01  04  00000000:00000000  3629942
10:  FFFFFFFF  FFFFFFFF  0   192.168.2.102:40052  1.2.3.4:8388    01  04  00000000:00000000  3630205
11:  FFFFFFFF  FFFFFFFF  0   192.168.2.102:40038  1.2.3.4:8388    01  04  00000000:00000000  3630113
12:  FFFFFFFF  FFFFFFFF  0   192.168.2.102:40050  1.2.3.4:8388    01  04  00000000:00000000  3630172
13:  FFFFFFFF  FFFFFFFF  0   192.168.2.102:40040  1.2.3.4:8388    01  04  00000000:00000000  3630115
14:  FFFFFFFF  FFFFFFFF  0   192.168.2.102:40106  1.2.3.4:8388    01  04  00000000:00000000  3632261

$ cat /proc/net/mptcp_net/snmp
MPCapableSYNRX                          3719
MPCapableSYNTX                          14762
MPCapableSYNACKRX                       5432
MPCapableACKRX                          3719
MPCapableFallbackACK                    0
MPCapableFallbackSYNACK                 8102
MPCapableRetransFallback                0
MPTCPCsumEnabled                        0
MPTCPRetrans                            25
MPFailRX                                0
MPCsumFail                              0
MPFastcloseRX                           27
MPFastcloseTX                           88
MPFallbackAckSub                        0
MPFallbackAckInit                       0
MPFallbackDataSub                       0
MPFallbackDataInit                      0
MPRemoveAddrSubDelete                   0
MPJoinNoTokenFound                      0
MPJoinAlreadyFallenback                 0
MPJoinSynTx                             3816
MPJoinSynRx                             0
MPJoinSynAckRx                          3070
MPJoinSynAckHMacFailure                 0
MPJoinAckRx                             0
MPJoinAckHMacFailure                    0
MPJoinAckMissing                        0
MPJoinAckRTO                            0
MPJoinAckRexmit                         19
NoDSSInWindow                           0
DSSNotMatching                          0
InfiniteMapRx                           0
DSSNoMatchTCP                           0
DSSTrimHead                             0
DSSSplitTail                            0
DSSPurgeOldSubSegs                      0
AddAddrRx                               160
AddAddrTx                               3823
RemAddrRx                               0
RemAddrTx                               0
MPJoinAlternatePort                     0
MPCurrEstab                             16

$ cat /proc/net/mptcp_fullmesh 
Index, Address-ID, Backup, IP-address, if-idx
IPv4, next v4-index: 7
2, 3, 0, 192.168.2.102, 5
4, 5, 0, 192.168.4.104, 6
5, 6, 0, 192.168.1.101, 7
6, 7, 0, 192.168.3.103, 9
IPv6, next v6-index: 0

$ ls -d /sys/class/net/eth*
/sys/class/net/eth1  /sys/class/net/eth2  /sys/class/net/eth3  /sys/class/net/eth4

$ cat /sys/class/net/eth*/flags
0x1003
0x1003
0x1003
0x1003

My concern is not just that the transfer speeds are low. But that the traffic seems to flow only through 1 of the available 4 WANs. And it is not just restricted to a single WAN either. It seems to flow through a random WAN for a few seconds, say 10-15s. then switches to another WAN, and this keeps happening continuously. I have also confirmed via tracebox that none of the WANs filter MPTCP flags.

matttbe commented 1 year ago

That's quite old.

Well, the image was built a couple years ago based on Raspbian Buster. I believe that should not be the reason I'm facing this issue.

More than 10k commits have been added from Linux Stable and from MPTCP sides, hard to tell...

Do you see the problem when you start an upload and/or a download? The MPTCP scheduler on the sender side is the one in charge of picking the paths to send data to.

You will probably need to take traces and analyse why no more data can be pushed. (CPU, buffers, limitations on the receiver (0-windows?) / sender side, etc.)

Downloading or uploading videos to YouTube always seem to use only a single WAN.

Are you sure the client/server generates TCP traffic is used in this case? Or UDP (QUIC)?

MPCapableFallbackSYNACK 8102

I guess this number is high because you receive (plain) TCP traffic from the client on a MPTCP-enabled socket.

SriramScorp commented 1 year ago

Do you see the problem when you start an upload and/or a download?

I am seeing the problem during both downloads & uploads. 90-95% of throughput seem to flow through 1 WAN whereas the rest flows through a 2nd WAN. And the used WAN keeps changing frequently.

Downloading or uploading videos to YouTube always seem to use only a single WAN.

Are you sure the client/server generates TCP traffic is used in this case? Or UDP (QUIC)?

I'm using youtube-dl for downloading and the web interface for uploads. To the extent of my knowledge, they both use TCP.

MPCapableFallbackSYNACK 8102

I guess this number is high because you receive (plain) TCP traffic from the client on a MPTCP-enabled socket.

I'm not following you here. By client, do you mean the shadowsocks client (i.e. raspberry device) or the windows system on the LAN network?

matttbe commented 1 year ago

Do you see the problem when you start an upload and/or a download?

I am seeing the problem during both downloads & uploads. 90-95% of throughput seem to flow through 1 WAN whereas the rest flows through a 2nd WAN. And the used WAN keeps changing frequently.

I advice you to take and look at packet traces. I don't know what available bandwidth you have but there can be various limitations (also including the shared bus for all interfaces on a RPI)

Downloading or uploading videos to YouTube always seem to use only a single WAN.

Are you sure the client/server generates TCP traffic is used in this case? Or UDP (QUIC)?

I'm using youtube-dl for downloading and the web interface for uploads. To the extent of my knowledge, they both use TCP.

Same here for the packet traces. By default, Youtube uses QUIC if the client supports it.

MPCapableFallbackSYNACK 8102

I guess this number is high because you receive (plain) TCP traffic from the client on a MPTCP-enabled socket.

I'm not following you here. By client, do you mean the shadowsocks client (i.e. raspberry device) or the windows system on the LAN network?

The connection on the LAN side: likely your end-client (windows?) is not supporting MPTCP while it looks like you configured the shadowsocks client to create MPTCP connections. What I mean is that if there are MPTCP fallback to TCP for the shadowsocks connection, that might explain why only one interface is used. But I guess that's not the case and the fallback is only on the LAN side.

SriramScorp commented 1 year ago

Just to clarify a couple things.

  1. Per #337, shadowsocks+mptcp cannot be used to aggregate UDP traffic. Is that right? I guess I made the same assumption as @jeetu28 that since shadowsocks-libev has an option to enable both tcp & udp (as in '-u' or, ' "mode":"tcp_and_udp" '), and that the ss-redir scripts here & here configures packet redirection rules for udp too, I was misled into believing that enabling UDP aggregation is only a matter of configuration. So, does shadowsocks only proxy udp traffic via any of the WANs without creating mptcp subflows?
  2. If the issue I'm facing with YouTube is due to the uploading/downloading happening via udp, then other applications using tcp alone should not be restricted to a single WAN. But, we are facing the single WAN usage issue even while publishing a stream via RTMP which is TCP. Though not quite as bad as uploading videos to yt, it still seems to use only 2 of the 4 WANs at any given time. The bandwidth used on the two interfaces are occasionally comparable, but most times 60-75% of traffic flows through a WAN and the rest through the second.

What packet traces can I provide you with so that you can point me in the right direction? Does tcpdump on all client-side WAN interfaces and on the primary interface in the server suffice? And, how could I use mptcp to achieve UDP aggregation?

matttbe commented 1 year ago

Just to clarify a couple things.

  1. Per Transport the UDP traffic using MPTCP #337, shadowsocks+mptcp cannot be used to aggregate UDP traffic. Is that right? I guess I made the same assumption as @jeetu28 that since shadowsocks-libev has an option to enable both tcp & udp (as in '-u' or, ' "mode":"tcp_and_udp" '), and that the ss-redir scripts here & here configures packet redirection rules for udp too, I was misled into believing that enabling UDP aggregation is only a matter of configuration. So, does shadowsocks only proxy udp traffic via any of the WANs without creating mptcp subflows?

Sorry, I don't know what Shadowsocks is doing with the UDP traffic. If it encapsulates it in a TCP connection/tunnel, then you can force it to use MPTCP. But encapsulate UDP into TCP sounds like a bad idea: if UDP was picked to minimise the latency, then using TCP on top is not recommended.

  1. If the issue I'm facing with YouTube is due to the uploading/downloading happening via udp, then other applications using tcp alone should not be restricted to a single WAN.

Indeed.

What packet traces can I provide you with so that you can point me in the right direction?

Analysing packet traces is time consuming and I'm sorry but I don't think I can do the whole analysis myself but I can give some pointers. Also as I said before, analysis the traces is just one part to identify where the problem could come from: sender or receiver. After, more analysis will be needed to understand what's the limitation: CPU, network, I/O, etc.

Does tcpdump on all client-side WAN interfaces and on the primary interface in the server suffice?

Yes, we need the view of the client and server, on each different interfaces, saved in a .pcap file. We don't need the data, just the payload, you can use -s 100 (or -s 150 with IPv6). Please minimise the number of connections, e.g. only run IPerf3 in upload with one connection (-P 1 -Z -t 5), then in download (-R). Please also check if the performances are the same when using multiple connections in // (-P 10).

And, how could I use mptcp to achieve UDP aggregation?

I don't recommend it but you can use a VPN (e.g. OpenVPN) using TCP tunnels.

SriramScorp commented 1 year ago

Following your suggestion, I ran the iperf3 test in both sender and receiver mode from a windows system on the LAN-side of the shadowsocks client. Below are the results from the iperf3 server.

# server -> receiver, client -> sender (upload test)
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.12 MBytes  9.38 Mbits/sec                  
[  5]   1.00-2.00   sec  3.75 MBytes  31.5 Mbits/sec                  
[  5]   2.00-3.00   sec  5.31 MBytes  44.6 Mbits/sec                  
[  5]   3.00-4.00   sec  4.69 MBytes  39.3 Mbits/sec                  
[  5]   4.00-5.00   sec  4.31 MBytes  36.2 Mbits/sec                  
[  5]   5.00-5.28   sec  1008 KBytes  29.9 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-5.28   sec  20.2 MBytes  32.1 Mbits/sec                  receiver

# server -> sender, client -> receiver (download test)
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  9.10 MBytes  76.4 Mbits/sec    0   13.8 KBytes       
[  5]   1.00-2.00   sec  16.1 MBytes   135 Mbits/sec    0   13.8 KBytes       
[  5]   2.00-3.00   sec  16.2 MBytes   136 Mbits/sec    0   13.8 KBytes       
[  5]   3.00-4.00   sec  6.25 MBytes  52.4 Mbits/sec    0   13.8 KBytes       
[  5]   4.00-5.00   sec  2.50 MBytes  21.0 Mbits/sec    0   13.8 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.05   sec  50.2 MBytes  83.5 Mbits/sec    0             sender

# server -> receiver, client -> sender (upload test)
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-5.22   sec  1.82 MBytes  2.93 Mbits/sec                  receiver
[  8]   0.00-5.22   sec  2.98 MBytes  4.79 Mbits/sec                  receiver
[ 10]   0.00-5.22   sec  1.92 MBytes  3.08 Mbits/sec                  receiver
[ 12]   0.00-5.22   sec  2.14 MBytes  3.43 Mbits/sec                  receiver
[ 14]   0.00-5.22   sec  1.73 MBytes  2.78 Mbits/sec                  receiver
[ 16]   0.00-5.22   sec  2.58 MBytes  4.14 Mbits/sec                  receiver
[ 18]   0.00-5.22   sec  1.65 MBytes  2.65 Mbits/sec                  receiver
[ 20]   0.00-5.22   sec  1.12 MBytes  1.79 Mbits/sec                  receiver
[ 22]   0.00-5.22   sec  1.00 MBytes  1.61 Mbits/sec                  receiver
[ 24]   0.00-5.22   sec  1.88 MBytes  3.02 Mbits/sec                  receiver
[SUM]   0.00-5.22   sec  18.8 MBytes  30.2 Mbits/sec                  receiver

# server -> sender, client -> receiver (download test)
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.06   sec  10.6 MBytes  17.6 Mbits/sec    0             sender
[  8]   0.00-5.06   sec  26.4 MBytes  43.7 Mbits/sec    0             sender
[ 10]   0.00-5.06   sec  11.3 MBytes  18.7 Mbits/sec    0             sender
[ 12]   0.00-5.06   sec  32.6 MBytes  54.1 Mbits/sec    0             sender
[ 14]   0.00-5.06   sec  41.5 MBytes  68.8 Mbits/sec    0             sender
[ 16]   0.00-5.06   sec  12.4 MBytes  20.6 Mbits/sec    0             sender
[ 18]   0.00-5.06   sec  19.6 MBytes  32.5 Mbits/sec    0             sender
[ 20]   0.00-5.06   sec  10.8 MBytes  17.9 Mbits/sec    0             sender
[ 22]   0.00-5.06   sec  9.08 MBytes  15.0 Mbits/sec    0             sender
[ 24]   0.00-5.06   sec  10.8 MBytes  17.9 Mbits/sec    0             sender
[SUM]   0.00-5.06   sec   185 MBytes   307 Mbits/sec    0             sender

During both single connection (-P 1) and multiple connection (-P 10) tests, the traffic seems to flow through all available WANs. But the issue of only one WAN being used still persists when publishing a stream via RTMP from obs-studio.

matttbe commented 1 year ago

During both single connection (-P 1) and multiple connection (-P 10) tests, the traffic seems to flow through all available WANs

The differences between the download with one and 10 flows seem to suggest you need to adapt net.ipv4.tcp_[rw]mem on both sides. (at least w for the server and r for the client).

But the issue of only one WAN being used still persists when publishing a stream via RTMP from obs-studio.

From what I see, RTMP can be on top of TCP or UDP (RTMFP). Are you sure TCP is used?

Also, some TCP CC (or maybe some clients/servers) could be very sensitive to latency changes: in short, it could decide not to push faster when multiple paths are being used because the latency increased a bit. For example some video conferences services running on top of TCP need a way to decide how much data to push and which video "quality" to stream while still keeping a low latency: in this case, it is possible the application doesn't try to send more not to increase the latency. If there is a noticeable latency difference between the links, you could have this kind of issue. You can maybe trick some algorithms by introducing artificial latency with netem for example but that's a workaround. (IF this is the source of your issue)