Open ChrisChoke opened 1 year ago
Hi Chris,
Thank you for this bug report! (and sorry for the delay)
(Note that MPTCPv0 and MPTCPv1 refers to the protocol, not the implementation)
if i reconnect wwan1, all is fine and the interface come back to the mptcp connection. but if i reconnect the wwan2 for the udp tap interface, this interface dont come back to the mptcp connection.
When wwan2
is reconnected, I guess you re-configure the routing rules (ip rule from <IP WWAN2> table <TABLE>
and ip route default via <GATEWAY WWAN2> dev wwan2 table <TABLE>
) and the endpoints (ip mptcp endpoint add <IP WWAN2> dev wwan2 <signal or subflow [fullmesh]>
) that have been removed when the interface has been put down, right?
my endpoints on client site are configured with
subflow fullmesh
just like on the server side, too.
On the server side, I guess you use signal
instead of subflow
, right?
Good Morning Mat,
When wwan2 is reconnected, I guess you re-configure the routing rules (
ip rule from <IP WWAN2> table <TABLE>
andip route default via <GATEWAY WWAN2> dev wwan2 table <TABLE>
) and the endpoints (ip mptcp endpoint add <IP WWAN2> dev wwan2 <signal or subflow [fullmesh]>
) that have been removed when the interface has been put down, right?
Yes that's right, I flush all tables ip rule flush table <table>
,
ip route flush table <table>
And the ip mptcp endpoint, too. In endpoint will show you <ip address> dev if9 subflow
which is the index of the iFace which I put down. So here I have to reconfigure because the index number will increase when I create new one.
On the server side, I guess you use signal instead of subflow, right?
Well, on the server I tried both but couldn't observe an other behavior. It's a bit unclear for me what I need to configure. On the shadowsocks (tessares) or RedHat tutorial it is described as signal on the server site but did not understand why I have to configure with signal.
Chris
And the ip mptcp endpoint, too. In endpoint will show you
dev if9 subflow which is the index of the iFace which I put down. So here I have to reconfigure because the index number will increase when I create new one.
OK, thank you, maybe we don't cover well this case where a new endpoint is added later on. It would be good to try to reproduce it in a simpler setup.
I just changed the title of the ticket, I hope it represents well the issue you have. If not, feel free to modify it.
On the server side, I guess you use signal instead of subflow, right?
Well, on the server I tried both but couldn't observe an other behavior. It's a bit unclear for me what I need to configure. On the shadowsocks (tessares) or RedHat tutorial it is described as signal on the server site but did not understand why I have to configure with signal.
You are not the only one to be confused by that, we should improve something there but not sure what :)
If you have multiple interfaces, you need to add MPTCP endpoints for each additional IP addresses you want to use in the MPTCP connection. You can use the flag subflow
, typically on the client side (to create additional subflows) or signal
, typically on the server side (to announce additional IP addresses).
If your server doesn't have additional IP addresses, no need to configure additional endpoints.
When wwan2 is reconnected, I guess you re-configure the routing rules (
ip rule from <IP WWAN2> table <TABLE>
andip route default via <GATEWAY WWAN2> dev wwan2 table <TABLE>
) and the endpoints (ip mptcp endpoint add <IP WWAN2> dev wwan2 <signal or subflow [fullmesh]>
) that have been removed when the interface has been put down, right?Yes that's right, I flush all tables
ip rule flush table <table>
,ip route flush table <table>
And the ip mptcp endpoint, too. In endpoint will show you<ip address> dev if9 subflow
which is the index of the iFace which I put down. So here I have to reconfigure because the index number will increase when I create new one.
Could you please list the full configuration steps you use, including devices and endpoints removal and recreation? Just to avoid natural language ambiguity. A shell script including all the relevant commands would be ideal.
Thank!
good morning team,
sorry for my big delay. i am currently on vacation and since 17 jul. i am back in office on 14 aug. (yeah thats large vacations :smiley: ) so i will try to share all of this how i do set up my devices.
Chris
hey hey,
sorry for delay, daily business and cold delayed me a liitle bit. i attached a zip with the steps i do on interface setup and recreation. its not pretty but i think you can understand what i do. hope it can help us 👍
Chris mptcpv1_git.zip
i attached a zip with the steps i do on interface setup and recreation. its not pretty but i think you can understand what i do. hope it can help us 👍
The scripts leaves several questions open/gray areas: how many vpn tunnels are you using in your test? does that use a tun or a tap interface? are the defined endpoints all 'fullmesh' ? it looks there are also 'backup' ones.
It would be more clear if you could provide, after recreating your setup, the output of the following commands: ip mptcp endpoint ip route ip -4 addr ss -MteimO nstat -az Tcp MPTcp
And additionally the paired "ip mptcp monitor" output and finally the output of the above commands just after the failures.
[ SF_CLOSED] token=66fe4d59 remid=0 locid=1 saddr4=10.30.0.2 daddr4=
sport=52003 dport=0 backup=1 error=104 ifindex=9
Note that this subflow closed due to a connection reset (errno=104) and the 0 dport value is really unexpected here. Possibly the NL PM is trying to create new subflow towards the peer of the first subflow, just after such subflow has been closed (and thus disconnected, zeroing the dport).
I suspect something like the attached patch below could help, @ChrisChoke: could you please give it a shot in your testbed? diffs.txt
hey paolo,
thank you very much for reply. i am compiling kernel at the moment at will test after finished.
graph TD;
A[client]-->| wwan1 tun tcp | B[server];
A[client]-->| wwan2 tap udp| C[compat server];
C[compat server]-->| eth0 | B[server];
i hope this diagram will help a bit. i use 2 tunnels. one tun tcp tunnel for mptcp in case that the provider supports mptcp and one tap udp tunnel to a compat server without mptcp kernel in case the provider block native mptcp. the created tap interface will be an endpoint for mptcp.
the backup interface is the created tun interface.
do i need to install the patched kernel on client and server site?! or just on client?!
Chris
thank you very much for reply. i am compiling kernel at the moment at will test after finished. i hope this diagram will help a bit. i use 2 tunnels. one tun tcp tunnel for mptcp in case that the provider supports mptcp and one tap udp tunnel to a compat server without mptcp kernel in case the provider block native mptcp. the created tap interface will be an endpoint for mptcp.
To be sure I'm on the same page: do you want to use mptcp as transport for the openvpn tunnel? And not as the application level protocol?
Out of sheer ignorance on my side is not clear what are the different between the tun and tap tunnels WRT the protocol headers stacking. Could you please list the expected headers for egress packets on each interface?
Please, additionally share the other info as per my previous comment.
do i need to install the patched kernel on client and server site?! or just on client?!
Just on the side actually creating the additional subflows, that is the client.
To be sure I'm on the same page: do you want to use mptcp as transport for the openvpn tunnel? And not as the application level protocol?
no i dont think so. its application level. i use mptcpize run openvpn <commands>
. so that openvpn use mptcp for tunnel.
but for my case that vodafone do not support mptcp and block it. i create a tap interface without mptcpize or something.
and that tap interface will be an endpoint for the mptcpize openvpn tunnel. so this tap interface should be a subflow and listed in ss -M4tn
.
Out of sheer ignorance on my side is not clear what are the different between the tun and tap tunnels WRT the protocol headers stacking. Could you please list the expected headers for egress packets on each interface?
okay, now i dont understand what i should do or describe :-) i am sorry.
i will recreate my setup with the new patched kernel at the moment. i hope i can share more details of your request from previous comment later today.
Chris
hey paolo,
i attached your requested output. some explanation for the tests:
test1: here the tap interface is the main mptcp flow with his own initial subflow. and the wwan1 is a subflow. i reconnected wwan2 interface which is the parent interface of the tap interface. after reconnect it did not come back to the mptcp connection.
test2: here the tap interface is the main mptcp flow with his own initial subflow. and the wwan1 is a subflow. i reconnected wwan1 interface. the wwan1 comes back to the mptcp connection. so here is no problem.
test3: here the wwan1 interface is the main mptcp flow with his own inital sublow. the tap interface dont join the mptcp connection. i reconnected wwan2, but it still doesnt join the mptcp connection.
test4: the same like test3 but the monitor show other results. in test3 you can see tap interface sf_closed but not in test4. in test4 you can see sf_closed from nativ wwan2 ip address but not in test3. its a bit strange.
Chris patched_tests.zip
hey guys, how are you? its been a long time since i heard anything. But i come back with some fresh news about my case.
so since february/march its looks like vodafone updated their setup in the field. Now they do not block mptcp anymore. My mptcp capable tests are successful now. So my backup solution for this cellular connection (the solution to create tap vpn interfaces for using as sublow.) will fade more and more in the background.
but one behavior leave me some questions in my head.
My established connections for example. Both are natively mptcp capable.
root@mptcp-v1-client:/home/user1# ss -M4tn
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp ESTAB 0 0 <cellular1-address>:54628 <external-address>:30194 #### initial subflow from mptcp mainflow connection, right?
tcp ESTAB 0 0 <cellular2-address>%wwan2:49913 <external-address>:30194
tcp ESTAB 0 52 10.0.1.1:22 10.0.1.80:55127
mptcp ESTAB 0 0 <cellular1-address>:54628 <external-address>:30194 #### mainflow connection
if i restart/loose connection from wwan1/cellular1. which behavior should i expect? (the interface which is "Netid" mptcp)
currently the subflow(tcp part) will change his state to FIN_WAIT and disappears after a while. the mainflow (mptcp part) still ESTAB. if the interface is back and up and running i dont get a new subflow of this interface. i would expect i should get something like this via ss command:
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp ESTAB 0 0 <cellular1-address>%wwan1:12345 <external-address>:30194
the second question is: What will happen if the mptcp-interface will go down and come back with a new ip-address?! what sould i expect in this case?! because the interfaces are cellular devices it could be possible that i get a new ip-address via dhcp from my ISP.
looking forward to resolve my issues.
Chris
Hi Chris,
Sorry for the delay. I recently made quite a few fixes around the path-manager, and some [1] [2] are not applied yet.
If you have the opportunity, do you think you could try to reproduce your issues with a kernel compiled from our export
branch, and ideally including the patches mentioned above?
if i restart/loose connection from wwan1/cellular1. which behavior should i expect? (the interface which is "Netid" mptcp)
The best is to remove the corresponding MPTCP endpoint. In this case, the linked subflows should be removed as well. When the MPTCP endpoint is re-added, the subflows will be added to the ongoing connections. You should expect the same behaviour if you re-create the MPTCP endpoint linked to any paths, including the first one. If a new endpoint is a new IP is added, the kernel will try to use it if possible (limits, firewall, etc.).
Hey Matt,
Great to hear from you. Sadly I am on vacation at the moment. I could start testing on 02 Sept. And I will make time to do this, to bring a feedback back asap.
I also noticed that with kernel 6.8 netlink ss command show me additional infos. I can see the initial subflow with %wwan1 for example. That's great. :-)
Hope you have a nice time till Sept. And I come back as soon as possible with fresh Infos.
Chris
Hey Matt,
i compiled your export branch on 02. Sept. tag: export/20240902T054954. at this point all your commits already were merged into export branch. i used debian testing with this new kernel. because iproute2 is on version 6.10 in debian testing. i hope i dont miss some features at the moment.
here my results. i marked some comments with #.
# wwan2 restarting fin-wait-1
root@mptcpv1:/home/chs# ss -M4tn
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp ESTAB 0 0 100.80.180.156:55460 < ext address >:30194
tcp FIN-WAIT-1 0 1 10.25.150.10%wwan2:49211 < ext address >:30194
tcp ESTAB 0 52 10.0.1.1:22 10.0.1.80:59520
tcp ESTAB 0 0 10.30.0.2%bond0:45303 < ext address >:30194
mptcp ESTAB 0 0 100.80.180.156:55460 < ext address >:30194
# wwan2 restarted estab NO problems. work as expected.
root@mptcpv1:/home/chs# ss -M4tn
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp ESTAB 0 0 100.80.180.156:55460 < ext address >:30194
tcp ESTAB 0 0 10.25.150.10%wwan2:50817 < ext address >:30194
tcp ESTAB 0 52 10.0.1.1:22 10.0.1.80:59520
tcp ESTAB 0 0 10.30.0.2%bond0:45303 < ext address >:30194
mptcp ESTAB 0 0 100.80.180.156:55460 < ext address >:30194
# wwan1 restarting with address 100.80.180.156
root@mptcpv1:/home/chs# ss -M4tn
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp FIN-WAIT-1 0 1 100.80.180.156:55460 < ext address >:30194
tcp ESTAB 0 0 10.25.150.10%wwan2:50817 < ext address >:30194
tcp ESTAB 0 52 10.0.1.1:22 10.0.1.80:59520
tcp ESTAB 0 0 10.30.0.2%bond0:45303 < ext address >:30194
mptcp ESTAB 0 0 100.80.180.156:55460 < ext address >:30194
# wwan 1 restarted. Got same address again from ISP.
root@mptcpv1:/home/chs# ifconfig wwan1
wwan1: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1500
inet 100.80.180.156 netmask 255.255.255.248 destination 100.80.180.156
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 161 bytes 25951 (25.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 194 bytes 27418 (26.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# wwan1 with address 100.80.180.156: no ESTAB tcp subflow. the old with FIN-WAIT-1 still listed. I expect a new subflow with ESTAB state for address 100.80.180.156.
root@mptcpv1:/home/chs# ss -M4tn
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp FIN-WAIT-1 0 1 100.80.180.156:55460 < ext address >:30194
tcp ESTAB 0 0 10.25.150.10%wwan2:50817 < ext address >:30194
tcp ESTAB 0 52 10.0.1.1:22 10.0.1.80:59520
tcp ESTAB 0 0 10.30.0.2%bond0:45303 < ext address >:30194
mptcp ESTAB 0 0 100.80.180.156:55460 < ext address >:30194
root@mptcpv1:/home/chs# ss -M4tni
# normal state with both interfaces connected and subflows are listed.
mptcp ESTAB 0 0 100.80.180.156:40724 < ext address >:30194
subflows:2 subflows_max:8 add_addr_accepted_max:8 remote_key token:5b92f783 write_seq:10095362430517201304 snd_una:10095362430517201304 rcv_nxt:8574327516016999043 local_addr_used:2 local_addr_max:3 bytes_sent:6976 bytes_received:7699 bytes_acked:6976 subflows_total:3 last_data_sent:1432 last_data_recv:2568 last_ack_recv:1380
# after reconnect wwan1 with no tcp ESTAB state after reconnect.
mptcp ESTAB 0 0 100.80.180.156:40724 < ext address >:30194
subflows:2 subflows_max:8 add_addr_accepted_max:8 remote_key token:5b92f783 write_seq:10095362430517204826 snd_una:10095362430517204826 rcv_nxt:8574327516017002367 local_addr_used:2 local_addr_max:3 bytes_sent:10498 bytes_received:11023 bytes_acked:10498 subflows_total:2 last_data_sent:576 last_data_recv:8760 last_ack_recv:544
# subflows and subflows_total are not cover. i would expect subflows: 1.
# the subflow of id0 where i would expect that it comes back after reconnecting this interface
tcp ESTAB 0 0 100.80.180.156:40724 < ext address >:30194
cubic wscale:11,2 rto:276 rtt:73.258/28.587 ato:40 mss:1448 pmtu:1500 rcvmss:1296 advmss:1448 cwnd:10 bytes_sent:6976 bytes_acked:6977 bytes_received:7699 segs_out:67 segs_in:61 data_segs_out:35 data_segs_in:36 send 1581261bps lastsnd:1428 lastrcv:2564 lastack:1376 pacing_rate 3162504bps delivery_rate 216696bps delivered:36 busy:2044ms rcv_rtt:33 rcv_space:14480 rcv_ssthresh:217336 minrtt:30.991 snd_wnd:57344 rcv_wnd:213540 tcp-ulp-mptcp flags:Mmec token:0000(id:0)/5b92f783(id:0) seq:76fe2172740cfa0d sfseq:1d9e ssnoff:cf14afcc maplen:76
i hoped i come back with some better news. but i still hoping to get this fixed. 👍 i am really preciated your work and i am thankful for that.
Chris
Hi @ChrisChoke,
Thank you for the update. I lost the context here: when your wwan interfaces are restarted, are the linked MPTCP endpoints being removed, then re-added when available again? The MPTCP in-kernel PM rely on these events to remove and re-add subflows.
Yes.
I do 'ip mptcp endpoint delete id
'ss -Mani' shows subflow_total counter which is on 3 when all is fine and it is on 2 after completed restart the main interface. , but should be 3 again.
Hey Team,
thanks for this great project and the work to get it in the linux kernel. nice job. i ve been using mptcpV0 since a while and want to migrate my setup to mptcpV1. but i have some strange behavior when my interfaces is reconnecting.
My setup has a client with 2 or more cellular interfaces while my server just have one wired interface. On both site i run Debian 12. Kernel 6.1. But i tried Kernel 6.3 from sid and your mptcp-next/export Kernel as well with a patch from #391.
On Server site i have openVPN up an running via
mptcpize enabled openvpn@server.service
ss command show me listen mptcp port. mptcpd.service is stopped and masked.On client site i using openvpn via
mptcpize run openvpn --some-options
. mptcpd is stopped and masked. i setup endpoint viaip mptcp endpoint
manually. the limits are onsubflow 8 and add_addr_accepted 8
when all provider i use support mptcp it looks great. but in my case i have a provider which seems to block mptcp since of beginning 2021 if i remember. so in that case i creating a udp openvpn tunnel to another server and use the created tap interface as subflow. sometimes i get it up and running.but if i reconnect the wwan interface which i am using for the udp connection and the tap interface is closed and created again, the subflow dont come back. another scenario is when i reconnect wwan1 which is the mainflow part, the wwan1 dont come back to the mptcp connection as well. ``ìp mptcp monitor```show me nothing about SF_CLOSED or something. just this:
initial connection:
after reconnect wwan1:
the second sceanrio is that the tap interface is the mainflow mptcp connection like so:
if i reconnect wwan1, all is fine and the interface come back to the mptcp connection. but if i reconnect the wwan2 for the udp tap interface, this interface dont come back to the mptcp connection. in this case i see in
ip mptcp monitor
this:initial connection:
reconnecting interface of mptcp mainflow:
the destination port printed out as 0. If the tap interface is recreated with openvpn the ifindex number is increasing. so i always have a new ifindex number. can this be a problem?
This setup works very well and stable with mptcpV0. Most of this run with mptcpV1 as well, but i run into this trouble when a cellular interface is reconnecting. my endpoints on client site are configured with
subflow fullmesh
just like on the server side, too.ip rules and routing tables for the interfaces are setted up.
hope we can explain this behavior and could find a working solution. :-) if you need further information, feel free asking. i will help where i can. but i am not familiar with gdb or something, here need some explanation beforehand.
greets Chris