multipath-tcp / mptcp_net-next

Development version of the Upstream MultiPath TCP Linux kernel 🐧
https://mptcp.dev
Other
278 stars 39 forks source link

Could you please let me know how to use "ip mptcp end points backup"? #191

Closed Kenoko closed 3 years ago

Kenoko commented 3 years ago

I try to make backup-interface, but I cannot do it, image1 Server and Client OS: Ubuntu21.04(kernel:5.11.0-16-generic) I confirmed I could send mptcp packets on both paths from client to server. so I want to make an active/standby structure in a next step. I executed a following command on client side, however the interface didn't become backup mode.The mptcp traffic kept running. ip mptcp add endpoint 192.168.20.2 dev enp0s9 backup and I created signal endpoint at server as below. ip mptcp add endpoint 192.168.40.2 dev enp0s9 signal The limit setting isip mptcp limits set subflow 1 add_addr_accepted 1 on both client and server.

So Could you please let me know how to make backup interface? I could make it by using mptcp v0 while I referred a following site, so I want to make it with mptcpv1 also. http://multipath-tcp.org/pmwiki.php/Users/Tools

matttbe commented 3 years ago

Hello,

The configuration you did seems good. Did you also configure IP rules and routes? http://multipath-tcp.org/pmwiki.php/Users/ConfigureRouting

Could you share some captures, one for each interface of the client (or server) to see if the behaviour seems correct? I suspect other subflows to be used, not the one you expect.

# tcpdump -i <IFACE> -n "tcp[tcpflags] & tcp-syn != 0"
Kenoko commented 3 years ago

Thanks for your support! I referred the configuration of the website. I configured routing on both servers as below. and There is no cross connection between each links on router.

route on client

192.168.10.0/24 dev enp0s8 proto kernel scope link src 192.168.10.2 192.168.20.0/24 dev enp0s9 proto kernel scope link src 192.168.20.2 192.168.30.0/24 via 192.168.10.1 dev enp0s8 proto static 192.168.40.0/24 via 192.168.20.1 dev enp0s9 proto static

routing policy

table 1 192.168.10.0/24 dev enp0s8 scope link 192.168.30.0/24 via 192.168.10.1 dev enp0s8 table 2 192.168.20.0/24 dev enp0s9 scope link 192.168.40.0/24 via 192.168.20.1 dev enp0s9

route on server

192.168.10.0/24 via 192.168.30.1 dev enp0s8 proto static 192.168.20.0/24 via 192.168.40.1 dev enp0s9 proto static 192.168.30.0/24 dev enp0s8 proto kernel scope link src 192.168.30.2 192.168.40.0/24 dev enp0s9 proto kernel scope link src 192.168.40.2

I got a following tcpdump at only enp0s9 on client ,but I couldn't get capture at enp0s8. The result of tcpdump

12:15:00.291864 IP 192.168.20.2.33673 > 192.168.40.2.12345: Flags [S], seq 2340819369, win 64240, options [mss 1460,sackOK,TS val 861298340 ecr 0,nop,wscale 7,mptcp join id 0 token 0xc39ab12 nonce 0x6e02b9d7], length 0 12:15:00.293999 IP 192.168.40.2.12345 > 192.168.20.2.33673: Flags [S.], seq 3600655604, ack 2340819370, win 65160, options [mss 1460,sackOK,TS val 1744158086 ecr 861298340,nop,wscale 7,mptcp join id 1 hmac 0xd7249e0f982095b8 nonce 0xc1f3a999], length 0

I send a file(200M) from client to server and I set 192.168.30.2 as target IP and I use only this socket(res->ai_family, res->ai_socktype, IPPROTO_MPTCP) in my source code.

Do I need other configuration? or Is there some incorrect configuration?

Kenoko commented 3 years ago

@matttbe Sorry for asking you many time. Could you please let me know how to make backup path?

VenkateswaranJ commented 3 years ago

@Kenoko

Have you tried executing the command ip mptcp add endpoint 192.168.20.2 dev enp0s9 backup before starting the client & server? I mean based on your question I understand that you are executing the command on the client-side when the application is running?

Kenoko commented 3 years ago

@VenkateswaranJ Thank you for your support! I executed this command before starting the application, however I couldn't make backup path. I would like to know how to make active/standby structure.

@matttbe Sorry, I shared incorrect tcpdump results. following is correct. so please check these tcpdump result. The result of tcpdump

enp0s9 on client

listening on enp0s9, link-type EN10MB (Ethernet), snapshot length 262144 bytes 13:35:37.516946 IP 192.168.20.2.57007 > 192.168.40.2.12345: Flags [S], seq 3392821094, win 64240, options [mss 1460,sackOK,TS val 1777751700 ecr 0,nop,wscale 7,mptcp join id 0 token 0x23e6561c nonce 0x2f51d687], length 0 13:35:37.519138 IP 192.168.40.2.12345 > 192.168.20.2.57007: Flags [S.], seq 4154909268, ack 3392821095, win 65160, options [mss 1460,sackOK,TS val 1895143772 ecr 1777751700,nop,wscale 7,mptcp join id 2 hmac 0x1ae0cbb5d05b5681 nonce 0x9bf08b96], length 0

enp0s8 on clinet

listening on enp0s8, link-type EN10MB (Ethernet), snapshot length 262144 bytes 13:35:37.510478 IP 192.168.10.2.54874 > 192.168.30.2.12345: Flags [S], seq 4073350759, win 64240, options [mss 1460,sackOK,TS val 632161022 ecr 0,nop,wscale 7,mptcp capable v1], length 0 13:35:37.512400 IP 192.168.30.2.12345 > 192.168.10.2.54874: Flags [S.], seq 3568740971, ack 4073350760, win 65160, options [mss 1460,sackOK,TS val 796316579 ecr 632161022,nop,wscale 7,mptcp capable v1 {0x7928cf9fef53264}], length 0

Do you have any idea to make a backup path?

matttbe commented 3 years ago

Hi @Kenoko

Sorry, I was not available these last days (and I still have a lot of stuff to catch up).

Routes seem good. From the traces you sent, there is no backup flag with the MP_JOIN on enp0s9. So indeed, there is an issue somewhere, either in the configuration or in the interpretation of the configuration.

I noticed that the command you sent was wrong: ip mptcp add endpoint 192.168.20.2 dev enp0s9 backup It should be endpoint add and not add endpoint. Is it just a wrong copy-paste or is it really the command you typed?

sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 backup

Just to be sure, can you give me the output of this command: ip mptcp endpoint show

If you see the backup flag, can you also try to set the backup flag on the server side as well and take traces again?

sudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 backup signal

Just to check if there is an issue only with the client or also with server.

Kenoko commented 3 years ago

@matttbe Thank you for your reply despite your busy schedule. Sorry I had a wrong copy-paste, and I paste the result ip mptcp endpoint show&ip mptcp limits show on client/server Clinet

$ip mptcp endpoint show 192.168.20.2 id 1 backup dev enp0s9 $ip mptcp limits show add_addr_accepted 1 subflows 1

Server

$ip mptcp endpoint show 192.168.40.2 id 1 signal dev enp0s9 $ip mptcp limits show add_addr_accepted 1 subflows 1

However I couldn't see backup flag. After this, I set on server side by using your commandsudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 backup signal Clinet

$ip mptcp endpoint show 192.168.20.2 id 1 backup dev enp0s9 $ip mptcp limits show add_addr_accepted 1 subflows 1

Server

$ip mptcp endpoint show 192.168.40.2 id 2 signal backup dev enp0s9 $ip mptcp limits show add_addr_accepted 1 subflows 1

tcp dump result is as below. There is no backup flag also.

enp0s8 on clinet listening on enp0s8, link-type EN10MB (Ethernet), snapshot length 262144 bytes 04:01:09.748405 IP 192.168.10.2.43522 > 192.168.30.2.12345: Flags [S], seq 1176472289, win 64240, options [mss 1460,sackOK,TS val 279891026 ecr 0,nop,wscale 7,mptcp capable v1], length 0 04:01:09.749948 IP 192.168.30.2.12345 > 192.168.10.2.43522: Flags [S.], seq 2868751690, ack 1176472290, win 65160, options [mss 1460,sackOK,TS val 3243649719 ecr 279891026,nop,wscale 7,mptcp capable v1 {0x4bc1db0df00581b7}], length 0 enp0s9 on server listening on enp0s9, link-type EN10MB (Ethernet), snapshot length 262144 bytes 04:01:09.753538 IP 192.168.20.2.38653 > 192.168.40.2.12345: Flags [S], seq 1739756007, win 64240, options [mss 1460,sackOK,TS val 642625727 ecr 0,nop,wscale 7,mptcp join id 0 token 0xa03fbfbe nonce 0x3830cef8], length 0 04:01:09.755973 IP 192.168.40.2.12345 > 192.168.20.2.38653: Flags [S.], seq 2759061971, ack 1739756008, win 65160, options [mss 1460,sackOK,TS val 1826250580 ecr 642625727,nop,wscale 7,mptcp join id 2 hmac 0xdbafbbdb84aa8530 nonce 0x4d21616a], length 0

Do I need to change something on configure of my servers?

matttbe commented 3 years ago

Strange, I will try to investigate more.

In theory, we should configure the client with:

sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 subflow backup

(subflow is required)

And the server:

sudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 signal backup

I guess the order of the flags there should not change anything because all flags are sent to the kernel and read when needed but I will investigate.

Kenoko commented 3 years ago

@matttbe Thank you for your checking! I tried to make backup path on another diagram as below,but I couldn't do it. Could you please check following situation also? image

I executed sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 subflow backup on client as you mentioned. I could see 'backup flag' on enp0s9, however the traffic was still running on the interfaces. I attached my client/server settings and tcpdump result of each interface.

client_setting.txt server_setting.txt enp0s8.log enp0s9.log

According to "RFC8684",It is written as follows. so I think there is no traffic on enp0s9 while enp0s8 is available. Could you please check why the traffic go through enp0s9?

Hosts can indicate at initial subflow setup whether they wish the subflow to be used as a regular or backup path -- a backup path only being used if there are no regular paths available.

matttbe commented 3 years ago

@Kenoko when discussing about this bug at the last meeting, we were wondering if this issue has not been fixed in a more recent kernel. We remembered having fixed something around there.

By chance, could you try with a more recent kernel, e.g. 5.12? e.g. built from https://kernel.ubuntu.com/~kernel-ppa/mainline/

Kenoko commented 3 years ago

@matttbe

Thank you for your advice and support.I installed following kernels on client/server.

I matched the kernel version between client and server and tried to make backup path. However the result was same as before.

the result is as below.

sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 subflow backup on client I could see 'backup flag' on enp0s9, however the traffic was still running on the interfaces.

sudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 signal backup on server There is no backup flag

Could you please check this?

matttbe commented 3 years ago

Thank you for having checked that!

sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 subflow backup on client I could see 'backup flag' on enp0s9, however the traffic was still running on the interfaces.

That's already better! If now you see backup in the SYN+MP_JOIN in the packet traces, it means there was an issue in the Path Manager.

But if you see traffic over this interface while non backup path are still active, there is an issue on the packet scheduler side.

sudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 signal backup on server There is no backup flag

Mmh, maybe a similar issue in the PM is there for the server side if you don't see the backup flag in the SYN+ACK+MP_JOIN.

Could you please check this?

Yes but to be honest with you, it might take some time for me to catch-up my backlog :)

Kenoko commented 3 years ago

@matttbe Thank you for support! I will wait for your checking

matttbe commented 3 years ago

We quickly discussed about this issue at the last weekly meeting we have and something that should be checked is, when the backup flag is correctly set in the SYN, maybe the traffic we can see on this interface is only "reinjected" traffic. If the non backup link is lossy, we can have a lot of reinjections, see #177.

Is the non backup link lossy in your case? Is the backup link "fully used": is the whole the capacity of the link used?

Kenoko commented 3 years ago

Thank you for your checking. I checked non backup status, I think the link is normal. I checked packet capture data on wire shark and there is no re-transmission.

matttbe commented 3 years ago

I will need to check this in more details but with a simple setup, I can see a lot of traffic on the backup link:

Setup:

ip netns add ns-a
ip netns add ns-b
ip link add ns-a-eth1 netns ns-a type veth peer name ns-b-eth1 netns ns-b
ip link add ns-a-eth2 netns ns-a type veth peer name ns-b-eth2 netns ns-b
ip -net ns-a link set lo up
ip -net ns-b link set lo up
ip -net ns-a addr add 10.0.1.1/24 dev ns-a-eth1
ip -net ns-b addr add 10.0.1.2/24 dev ns-b-eth1
ip -net ns-a addr add 10.0.2.1/24 dev ns-a-eth2
ip -net ns-b addr add 10.0.2.2/24 dev ns-b-eth2
ip -net ns-a link set ns-a-eth1 up
ip -net ns-a link set ns-a-eth2 up
ip -net ns-b link set ns-b-eth1 up
ip -net ns-b link set ns-b-eth2 up
ip -net ns-a mptcp limits set subflow 1 add_addr_accepted 1
ip -net ns-b mptcp limits set subflow 2 add_addr_accepted 1

tc -net ns-a qdisc add dev ns-a-eth1 root netem delay 250ms
tc -net ns-b qdisc add dev ns-b-eth1 root netem delay 250ms
tc -net ns-a qdisc add dev ns-a-eth2 root netem delay 250ms
tc -net ns-b qdisc add dev ns-b-eth2 root netem delay 250ms

ip -net ns-a mptcp endpoint add 10.0.2.1 dev ns-a-eth2 subflow backup
ip -net ns-b mptcp endpoint add 10.0.2.2 dev ns-b-eth2 signal backup

Utilisation:

# ip netns exec ns-a ifstat -i ns-a-eth1,ns-a-eth2 -btT
  Time        ns-a-eth1           ns-a-eth2             Total       
HH:MM:SS   Kbps in  Kbps out   Kbps in  Kbps out   Kbps in  Kbps out
09:16:48      0.00      0.61      0.00      0.00      0.00      0.61
09:16:49    116.15    337.01      0.00      1.34    116.15    338.35
09:16:50    683.99    967.09      1.73    107.44    685.72   1074.52
09:16:51   2714.18   1975.25     10.31    576.54   2724.48   2551.79
09:16:52  10978.68   4814.72     20.62    895.00  10999.29   5709.72
09:16:53  36498.11  19653.30     59.47   4224.46  36557.59  23877.75
09:16:54  40616.52  22022.35    121.43   5551.44  40737.94  27573.79
09:16:55  44300.90  35689.21    186.82   5980.72  44487.72  41669.92
09:16:56  44584.53  30997.82    151.15   5922.61  44735.68  36920.43
09:16:57  38932.46  31158.00    173.56  11743.47  39106.01  42901.47
09:16:58  35989.38  25915.49    259.24  10975.08  36248.62  36890.57
09:16:59  33078.95  33028.99    112.96  11497.24  33191.91  44526.24
09:17:00  33867.00  18244.52    181.52  21597.89  34048.52  39842.41
09:17:01  35889.53  20784.96    207.31  23876.98  36096.84  44661.94
09:17:02  36768.68  20588.43    239.28  24136.13  37007.96  44724.56
09:17:03  37675.66  21021.42    239.75  23808.62  37915.41  44830.04
09:17:04  37830.23  20818.03    237.37  23169.54  38067.60  43987.56
09:17:05  38037.17  20426.63    248.11  23715.60  38285.28  44142.23
09:17:06  37943.80  19045.66    249.33  25455.02  38193.13  44500.68
09:17:07  38244.62  14372.56    259.54  25212.54  38504.16  39585.10
09:17:08  38329.65  18481.79    260.33  24892.52  38589.99  43374.31
09:17:09  38264.39  18791.64    257.08  25590.48  38521.46  44382.12
09:17:10  37954.59  18695.06    259.22  25610.29  38213.80  44305.35
09:17:11  40835.93  18996.24    258.41  25593.15  41094.35  44589.39
09:17:12  38571.42  18830.26    246.54  25674.13  38817.96  44504.39
09:17:13  38653.77   9271.67    207.09   7858.68  38860.86  17130.35
09:17:14  24539.33    395.14      1.34      1.22  24540.67    396.36
09:17:15      0.54      0.00      0.61      0.61      1.15      0.61
09:17:16      0.00      0.00      0.33      0.00      0.33      0.00

Still, it looks like the backup link is only used to upload data of my bi-directional flow.

I checked packet capture data on wire shark and there is no re-transmission.

Be careful that here, we don't talk about TCP re-transmission but MPTCP re-injection: packets that were sent in one subflow but reinjected to another one.

pabeni commented 3 years ago

I will need to check this in more details but with a simple setup, I can see a lot of traffic on the backup link: [...] Utilisation:

# ip netns exec ns-a ifstat -i ns-a-eth1,ns-a-eth2 -btT
  Time        ns-a-eth1           ns-a-eth2             Total       
HH:MM:SS   Kbps in  Kbps out   Kbps in  Kbps out   Kbps in  Kbps out
09:16:48      0.00      0.61      0.00      0.00      0.00      0.61
09:16:49    116.15    337.01      0.00      1.34    116.15    338.35
09:16:50    683.99    967.09      1.73    107.44    685.72   1074.52
09:16:51   2714.18   1975.25     10.31    576.54   2724.48   2551.79
09:16:52  10978.68   4814.72     20.62    895.00  10999.29   5709.72
09:16:53  36498.11  19653.30     59.47   4224.46  36557.59  23877.75
[...]

Still, it looks like the backup link is only used to upload data of my bi-directional flow.

The problem is that when the 'backup' flag is set, the flag is announced to the peer (that is: enforced in the opposite direction), but not set in the forward/egress direction. That is actually compliant with the RFC - the backup flag is a per direction feature - but very confusing.

There is a simple fix for this - setting the backup flag in both direction - but it will break active backup (when no user-space PM is in action).

A possibly complete solution would be to additionally introduce PM logic inside the kernel to automatically close a subflow making no forward progresses and triggerring too much re-injection when other subflows are available, as per paragraph 3.3.6. in the RFC. Such behavior should be tunable and the user-space PM should be able to turn it off.

matttbe commented 3 years ago

The problem is that when the 'backup' flag is set, the flag is announced to the peer (that is: enforced in the opposite direction), but not set in the forward/egress direction. That is actually compliant with the RFC - the backup flag is a per direction feature - but very confusing.

If we tell the PM an endpoint has to be considered as backup, should it eventually ensure that the packet scheduler is also taking this into consideration even if the other peer didn't set the backup flag?

It is true that the backup flag is there to tell the other peer a path should be used as a backup one and it should use it only if there is no other non backup path available (still, not a MUST). But typically, the client knows a path is more costly and might mark it as backup for the other peer but also for itself, not to use it if other paths are available, no? :)

There is a simple fix for this - setting the backup flag in both direction - but it will break active backup (when no user-space PM is in action).

I would need to check again but I think in my case, the backup flag was set on both the SYN and SYN+ACK.

pabeni commented 3 years ago

There is a simple fix for this - setting the backup flag in both direction - but it will break active backup (when no user-space PM is in action).

I would need to check again but I think in my case, the backup flag was set on both the SYN and SYN+ACK.

Right, the server is currently mirroring back the backup flag from ingress syn MPJ packet into the SYN-ACK reply.

But the client code is ignoring such bit - which is a bug, to be fixed.

The discussion about active backup still applies - when fixing such bug we additionally need a smarter in kernel PM.