MPTCP traffic does not always resume on a temporary non functioning interface although create_on_err is set

knxhm commented 5 years ago

Kernel 4.14.77+ up to commit commit b3b861bd478c56eabd139fc6b873466f43c475a5 Author: Christoph Paasch cpaasch@apple.com Date: Thu Oct 18 10:06:28 2018 -0700

Toplogy, MPTCP-PC with 3 interfaces communicates via router with MPTCP-PC with 1 interface, NAT on router interface

Test procedure:

tcpdump on MPTCP3int-PC: tcpdump -i any -s100 -w tcpdump.pcap run ping from MPTCP3int-PC: ping 192.168.3.2 MPTCP3int-PC starts a transfer: iperf3 -c 192.168.0.1 -R -t 1000 -p 5202 after few sec on router start dropping packets on one interface : ip rule add from 192.168.3.1 lookup 3 ip route add default table 3 via 127.0.0.1 tcp traffic now only on 2 interfaces, icmp traffic is not affected, continues after few sec on router stop dropping packets: ip route del default table 3 via 127.0.0.1 MPTCP traffic on 192.168.3.1 interfaces does not come back after few sec stop iperf3 and restart iperf3 iperf3 -c 192.168.0.1 -R -t 1000 -p 5202 interface 192.168.3.1 is immediately used again by MPTCP after few sec stop tcpdump, file is here

tcpdump.gz

Note: If I drop packets on router by another method like ip route add blackhole 192.168.3.1/32 / wait / ip route del blackhole 192.168.3.1/32, then it works ! MPTCP traffic is resumed on this interface after E2E connectivity is restored.

cpaasch commented 5 years ago

Can you show us the state of the routing tables and ip-rule configs at each of the steps?

With your last comment it looks more like a configuration issue. I guess that because your ip rule is still there, traffic can't recover because you are still hitting this rule.

Unfortunately I can't open the tcpdump-file. It seems to be corrupted.

knxhm commented 5 years ago

Pls gzip -d the tcpdump file before loading into wireshark. In there you will see there are no any attempts to reestablish a subflow on the 192.168.3.1 interface after the break, but just with a new tcp connection.

knxhm commented 5 years ago

Yes, there is the IP rule left without route in the associated table. But I dont see this causes an issue. It is the same behaviour when I del the route + del the rule. As another observation, when I remove the pkt dropping and leave the iperf running over the 2 remaining links, and I start a 2nd tcp session in parallel, like scp filetransfer copying from 192.168.0.1 to iperf client machine, this file transfer does utilize the 192.168.3.1 interface beside the other 2, and when the filetransfer is over, the iperf from 1st session is still running but still using only 2 links. I dont see a routing issue here. I rather think, if the congestion buffer of the sender for the one subflow is full and gets no ACKs, the subflow reestablishment is impacted somehow. Because with the drop method which causes the problem, I drop ACKs, in contrast to the other method when subflow reestablishment is working, I drop data segments.

knxhm commented 5 years ago

here is another pkt trace from iperf client side where i had all the time in parallel to things described above a ping running between 192.168.3.1 and 192.168.0.1. The icmp had only once a few secs interruption during the pkt drop route was inserted, and then it continued when the pkt drop route was removed. But the tcp traffic from 192.168.3.1 stopped permanently. This should prove there is no routing issue.

gzipped trace file tcpicmp.pcap.gz

cpaasch commented 5 years ago

Looking at the pcap - the problem is that the way you are creating the routing-problem is that you are creating a loop. Eventually the router is sending an ICMP Time-to-live exceeded. However, the MPTCP-stack is only going to recreate a new subflow after an ETIMEDOUT.

We should not re-attempt new subflows after a routing-loop because then we will basically loop on attempting to create new subflows if the routing loop does not get resolved quickly.

knxhm commented 5 years ago

I see. Thanks for your explanation. I am seeing sporadically non restarting subflows in a test system with real traffic on it and I though I have replicated it in this way but according to your findings it seems probably not. So I shall look on this further. Thanks again .

multipath-tcp / mptcp

MPTCP traffic does not always resume on a temporary non functioning interface although create_on_err is set #290