Open SriramScorp opened 1 year ago
Hello,
kernel.osrelease = 5.4.83-MPTCP+
That's quite old. Did you get all the recent fixes from MPTCP (and elsewhere in the kernel). The last version is on v5.4.230.
Other than that, with the info you provided, everything seems OK.
It would be good to get info from /proc/net/mptcp_net/mptcp
(and snmp
) when a transfer is in progress.
Eventually also from /proc/net/mptcp_fullmesh
to check if the PM is OK (I guess it is).
Packet traces can help too.
If you do have multiple subflows but no traffic on some of them while you should (and you didn't set the "backup" flag on the interfaces), it is important to see the view of the sender. Many reasons can explain why the transfer is limited, e.g. CPU resources, network buffers, bufferbloat, middleboxes, bugs, etc. Monitoring resources and capture packets traces can help.
That's quite old.
Well, the image was built a couple years ago based on Raspbian Buster. I believe that should not be the reason I'm facing this issue.
$ cat /proc/net/mptcp_net/mptcp (decoded)
sl loc_tok rem_tok v6 local_address remote_address st ns tx_queue:rx_queue inode
0: FFFFFFFF FFFFFFFF 0 192.168.2.102:40112 1.2.3.4:8388 01 04 00000000:00000000 3632263
1: FFFFFFFF FFFFFFFF 0 192.168.2.102:9150 1.2.3.4:8388 01 04 00000000:00000000 3463736
2: FFFFFFFF FFFFFFFF 0 192.168.2.102:40048 1.2.3.4:8388 01 03 00000000:00000000 3630169
3: FFFFFFFF FFFFFFFF 0 192.168.2.102:40044 1.2.3.4:8388 01 04 00000000:00000000 3630119
4: FFFFFFFF FFFFFFFF 0 192.168.2.102:40104 1.2.3.4:8388 01 04 00000000:00000000 3632260
5: FFFFFFFF FFFFFFFF 0 192.168.2.102:40042 1.2.3.4:8388 01 04 00000000:00000000 3630117
6: FFFFFFFF FFFFFFFF 0 192.168.2.102:40036 1.2.3.4:8388 01 04 00000000:00000000 3630084
7: FFFFFFFF FFFFFFFF 0 192.168.2.102:39752 1.2.3.4:8388 01 04 00000000:00000000 3612157
8: FFFFFFFF FFFFFFFF 0 192.168.2.102:40130 1.2.3.4:8388 01 04 00000000:00000000 3638341
9: FFFFFFFF FFFFFFFF 0 192.168.2.102:40074 1.2.3.4:8388 01 04 00000000:00000000 3629942
10: FFFFFFFF FFFFFFFF 0 192.168.2.102:40052 1.2.3.4:8388 01 04 00000000:00000000 3630205
11: FFFFFFFF FFFFFFFF 0 192.168.2.102:40038 1.2.3.4:8388 01 04 00000000:00000000 3630113
12: FFFFFFFF FFFFFFFF 0 192.168.2.102:40050 1.2.3.4:8388 01 04 00000000:00000000 3630172
13: FFFFFFFF FFFFFFFF 0 192.168.2.102:40040 1.2.3.4:8388 01 04 00000000:00000000 3630115
14: FFFFFFFF FFFFFFFF 0 192.168.2.102:40106 1.2.3.4:8388 01 04 00000000:00000000 3632261
$ cat /proc/net/mptcp_net/snmp
MPCapableSYNRX 3719
MPCapableSYNTX 14762
MPCapableSYNACKRX 5432
MPCapableACKRX 3719
MPCapableFallbackACK 0
MPCapableFallbackSYNACK 8102
MPCapableRetransFallback 0
MPTCPCsumEnabled 0
MPTCPRetrans 25
MPFailRX 0
MPCsumFail 0
MPFastcloseRX 27
MPFastcloseTX 88
MPFallbackAckSub 0
MPFallbackAckInit 0
MPFallbackDataSub 0
MPFallbackDataInit 0
MPRemoveAddrSubDelete 0
MPJoinNoTokenFound 0
MPJoinAlreadyFallenback 0
MPJoinSynTx 3816
MPJoinSynRx 0
MPJoinSynAckRx 3070
MPJoinSynAckHMacFailure 0
MPJoinAckRx 0
MPJoinAckHMacFailure 0
MPJoinAckMissing 0
MPJoinAckRTO 0
MPJoinAckRexmit 19
NoDSSInWindow 0
DSSNotMatching 0
InfiniteMapRx 0
DSSNoMatchTCP 0
DSSTrimHead 0
DSSSplitTail 0
DSSPurgeOldSubSegs 0
AddAddrRx 160
AddAddrTx 3823
RemAddrRx 0
RemAddrTx 0
MPJoinAlternatePort 0
MPCurrEstab 16
$ cat /proc/net/mptcp_fullmesh
Index, Address-ID, Backup, IP-address, if-idx
IPv4, next v4-index: 7
2, 3, 0, 192.168.2.102, 5
4, 5, 0, 192.168.4.104, 6
5, 6, 0, 192.168.1.101, 7
6, 7, 0, 192.168.3.103, 9
IPv6, next v6-index: 0
$ ls -d /sys/class/net/eth*
/sys/class/net/eth1 /sys/class/net/eth2 /sys/class/net/eth3 /sys/class/net/eth4
$ cat /sys/class/net/eth*/flags
0x1003
0x1003
0x1003
0x1003
My concern is not just that the transfer speeds are low. But that the traffic seems to flow only through 1 of the available 4 WANs. And it is not just restricted to a single WAN either. It seems to flow through a random WAN for a few seconds, say 10-15s. then switches to another WAN, and this keeps happening continuously. I have also confirmed via tracebox that none of the WANs filter MPTCP flags.
That's quite old.
Well, the image was built a couple years ago based on Raspbian Buster. I believe that should not be the reason I'm facing this issue.
More than 10k commits have been added from Linux Stable and from MPTCP sides, hard to tell...
Do you see the problem when you start an upload and/or a download? The MPTCP scheduler on the sender side is the one in charge of picking the paths to send data to.
You will probably need to take traces and analyse why no more data can be pushed. (CPU, buffers, limitations on the receiver (0-windows?) / sender side, etc.)
Downloading or uploading videos to YouTube always seem to use only a single WAN.
Are you sure the client/server generates TCP traffic is used in this case? Or UDP (QUIC)?
MPCapableFallbackSYNACK 8102
I guess this number is high because you receive (plain) TCP traffic from the client on a MPTCP-enabled socket.
Do you see the problem when you start an upload and/or a download?
I am seeing the problem during both downloads & uploads. 90-95% of throughput seem to flow through 1 WAN whereas the rest flows through a 2nd WAN. And the used WAN keeps changing frequently.
Downloading or uploading videos to YouTube always seem to use only a single WAN.
Are you sure the client/server generates TCP traffic is used in this case? Or UDP (QUIC)?
I'm using youtube-dl for downloading and the web interface for uploads. To the extent of my knowledge, they both use TCP.
MPCapableFallbackSYNACK 8102
I guess this number is high because you receive (plain) TCP traffic from the client on a MPTCP-enabled socket.
I'm not following you here. By client, do you mean the shadowsocks client (i.e. raspberry device) or the windows system on the LAN network?
Do you see the problem when you start an upload and/or a download?
I am seeing the problem during both downloads & uploads. 90-95% of throughput seem to flow through 1 WAN whereas the rest flows through a 2nd WAN. And the used WAN keeps changing frequently.
I advice you to take and look at packet traces. I don't know what available bandwidth you have but there can be various limitations (also including the shared bus for all interfaces on a RPI)
Downloading or uploading videos to YouTube always seem to use only a single WAN.
Are you sure the client/server generates TCP traffic is used in this case? Or UDP (QUIC)?
I'm using youtube-dl for downloading and the web interface for uploads. To the extent of my knowledge, they both use TCP.
Same here for the packet traces. By default, Youtube uses QUIC if the client supports it.
MPCapableFallbackSYNACK 8102
I guess this number is high because you receive (plain) TCP traffic from the client on a MPTCP-enabled socket.
I'm not following you here. By client, do you mean the shadowsocks client (i.e. raspberry device) or the windows system on the LAN network?
The connection on the LAN side: likely your end-client (windows?) is not supporting MPTCP while it looks like you configured the shadowsocks client to create MPTCP connections. What I mean is that if there are MPTCP fallback to TCP for the shadowsocks connection, that might explain why only one interface is used. But I guess that's not the case and the fallback is only on the LAN side.
Just to clarify a couple things.
What packet traces can I provide you with so that you can point me in the right direction? Does tcpdump on all client-side WAN interfaces and on the primary interface in the server suffice? And, how could I use mptcp to achieve UDP aggregation?
Just to clarify a couple things.
- Per Transport the UDP traffic using MPTCP #337, shadowsocks+mptcp cannot be used to aggregate UDP traffic. Is that right? I guess I made the same assumption as @jeetu28 that since shadowsocks-libev has an option to enable both tcp & udp (as in '-u' or, ' "mode":"tcp_and_udp" '), and that the ss-redir scripts here & here configures packet redirection rules for udp too, I was misled into believing that enabling UDP aggregation is only a matter of configuration. So, does shadowsocks only proxy udp traffic via any of the WANs without creating mptcp subflows?
Sorry, I don't know what Shadowsocks is doing with the UDP traffic. If it encapsulates it in a TCP connection/tunnel, then you can force it to use MPTCP. But encapsulate UDP into TCP sounds like a bad idea: if UDP was picked to minimise the latency, then using TCP on top is not recommended.
- If the issue I'm facing with YouTube is due to the uploading/downloading happening via udp, then other applications using tcp alone should not be restricted to a single WAN.
Indeed.
What packet traces can I provide you with so that you can point me in the right direction?
Analysing packet traces is time consuming and I'm sorry but I don't think I can do the whole analysis myself but I can give some pointers. Also as I said before, analysis the traces is just one part to identify where the problem could come from: sender or receiver. After, more analysis will be needed to understand what's the limitation: CPU, network, I/O, etc.
Does tcpdump on all client-side WAN interfaces and on the primary interface in the server suffice?
Yes, we need the view of the client and server, on each different interfaces, saved in a .pcap
file. We don't need the data, just the payload, you can use -s 100
(or -s 150
with IPv6).
Please minimise the number of connections, e.g. only run IPerf3 in upload with one connection (-P 1 -Z -t 5
), then in download (-R
). Please also check if the performances are the same when using multiple connections in // (-P 10
).
And, how could I use mptcp to achieve UDP aggregation?
I don't recommend it but you can use a VPN (e.g. OpenVPN) using TCP tunnels.
Following your suggestion, I ran the iperf3 test in both sender and receiver mode from a windows system on the LAN-side of the shadowsocks client. Below are the results from the iperf3 server.
# server -> receiver, client -> sender (upload test)
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.12 MBytes 9.38 Mbits/sec
[ 5] 1.00-2.00 sec 3.75 MBytes 31.5 Mbits/sec
[ 5] 2.00-3.00 sec 5.31 MBytes 44.6 Mbits/sec
[ 5] 3.00-4.00 sec 4.69 MBytes 39.3 Mbits/sec
[ 5] 4.00-5.00 sec 4.31 MBytes 36.2 Mbits/sec
[ 5] 5.00-5.28 sec 1008 KBytes 29.9 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-5.28 sec 20.2 MBytes 32.1 Mbits/sec receiver
# server -> sender, client -> receiver (download test)
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 9.10 MBytes 76.4 Mbits/sec 0 13.8 KBytes
[ 5] 1.00-2.00 sec 16.1 MBytes 135 Mbits/sec 0 13.8 KBytes
[ 5] 2.00-3.00 sec 16.2 MBytes 136 Mbits/sec 0 13.8 KBytes
[ 5] 3.00-4.00 sec 6.25 MBytes 52.4 Mbits/sec 0 13.8 KBytes
[ 5] 4.00-5.00 sec 2.50 MBytes 21.0 Mbits/sec 0 13.8 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.05 sec 50.2 MBytes 83.5 Mbits/sec 0 sender
# server -> receiver, client -> sender (upload test)
[ ID] Interval Transfer Bitrate
[ 5] 0.00-5.22 sec 1.82 MBytes 2.93 Mbits/sec receiver
[ 8] 0.00-5.22 sec 2.98 MBytes 4.79 Mbits/sec receiver
[ 10] 0.00-5.22 sec 1.92 MBytes 3.08 Mbits/sec receiver
[ 12] 0.00-5.22 sec 2.14 MBytes 3.43 Mbits/sec receiver
[ 14] 0.00-5.22 sec 1.73 MBytes 2.78 Mbits/sec receiver
[ 16] 0.00-5.22 sec 2.58 MBytes 4.14 Mbits/sec receiver
[ 18] 0.00-5.22 sec 1.65 MBytes 2.65 Mbits/sec receiver
[ 20] 0.00-5.22 sec 1.12 MBytes 1.79 Mbits/sec receiver
[ 22] 0.00-5.22 sec 1.00 MBytes 1.61 Mbits/sec receiver
[ 24] 0.00-5.22 sec 1.88 MBytes 3.02 Mbits/sec receiver
[SUM] 0.00-5.22 sec 18.8 MBytes 30.2 Mbits/sec receiver
# server -> sender, client -> receiver (download test)
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.06 sec 10.6 MBytes 17.6 Mbits/sec 0 sender
[ 8] 0.00-5.06 sec 26.4 MBytes 43.7 Mbits/sec 0 sender
[ 10] 0.00-5.06 sec 11.3 MBytes 18.7 Mbits/sec 0 sender
[ 12] 0.00-5.06 sec 32.6 MBytes 54.1 Mbits/sec 0 sender
[ 14] 0.00-5.06 sec 41.5 MBytes 68.8 Mbits/sec 0 sender
[ 16] 0.00-5.06 sec 12.4 MBytes 20.6 Mbits/sec 0 sender
[ 18] 0.00-5.06 sec 19.6 MBytes 32.5 Mbits/sec 0 sender
[ 20] 0.00-5.06 sec 10.8 MBytes 17.9 Mbits/sec 0 sender
[ 22] 0.00-5.06 sec 9.08 MBytes 15.0 Mbits/sec 0 sender
[ 24] 0.00-5.06 sec 10.8 MBytes 17.9 Mbits/sec 0 sender
[SUM] 0.00-5.06 sec 185 MBytes 307 Mbits/sec 0 sender
During both single connection (-P 1) and multiple connection (-P 10) tests, the traffic seems to flow through all available WANs. But the issue of only one WAN being used still persists when publishing a stream via RTMP from obs-studio.
During both single connection (-P 1) and multiple connection (-P 10) tests, the traffic seems to flow through all available WANs
The differences between the download with one and 10 flows seem to suggest you need to adapt net.ipv4.tcp_[rw]mem
on both sides. (at least w
for the server and r
for the client).
But the issue of only one WAN being used still persists when publishing a stream via RTMP from obs-studio.
From what I see, RTMP can be on top of TCP or UDP (RTMFP
). Are you sure TCP is used?
Also, some TCP CC (or maybe some clients/servers) could be very sensitive to latency changes: in short, it could decide not to push faster when multiple paths are being used because the latency increased a bit. For example some video conferences services running on top of TCP need a way to decide how much data to push and which video "quality" to stream while still keeping a low latency: in this case, it is possible the application doesn't try to send more not to increase the latency. If there is a noticeable latency difference between the links, you could have this kind of issue. You can maybe trick some algorithms by introducing artificial latency with netem
for example but that's a workaround. (IF this is the source of your issue)
Trying to use shadowsocks-libev v3.3.5 for aggregating multiple WANs. Created per-interface entries in tables 'ip rule' and 'ip route'. ss-server is running in 64-bit debian-10 vps. ss-redir is running in 32-bit raspi-os, configured to work as a router.
While running ookla speedtest or upload/download files from router-side LAN system, all WANs are used only occasionally. Ookla speedtest with 'Multi' connection uses all WAN whereas 'Single' connection uses only one of the 4 WANs randomly. Downloading or uploading videos to YouTube always seem to use only a single WAN.
I cannot figure out if the issue is from the mptcp-capable kernel not doing its job correctly or if its something from the shadowsocks-side.
Client-side info: