Simulate TCP fallback in the middle of MPTCP communication for testing

VenkateswaranJ commented 3 years ago

Hi,

I have been testing MPTCP version_0.95 for various corner cases and one of which is TCP fall back. I have tested the TCP fallback that happened at the initial stage (due to middlebox or unsupported peer) and I can able to get the info via getsockopt(). Currently, I would like to simulate the rare scenario of TCP fall back that happened in the middle of the client<->server communication.

My test setup: mptcp_fallback

Both server and client has MPTCP enabled and the router doesn't MPTCP enabled (just for manipulating and forwarding packets).

On the router side, I tried to Manipulate the packets using scapy and NetfilterQueue.

Test steps:

start the client and server and create an MPTCP connection and send some traffic between them for some time.
Run the scapy packet manipulation script to modify the MPTCP option in the TCP header (acts like worst middlebox :) ).
Expect that MPTCP should fall back to TCP.

Python script user at router:

#! /usr/bin/env python2.7
from scapy.all import *
from netfilterqueue import NetfilterQueue
import os

iptablesr = "iptables -A FORWARD -j NFQUEUE --queue-num 1"

os.system(iptablesr)
os.system("sysctl net.ipv4.ip_forward=1")

def modify(packet):
    ip_pkt = IP(packet.get_payload())
    try:
        ip_tcp = ip_pkt.getlayer(TCP)
        tcp_options = ip_tcp.options
        if tcp_options[-1][0] == 30: # check if tcp option number is 30 (MPTCP)
            lst_opt = list(tcp_options[-1]) 
            lst_opt[1] = "fallback please" # changing mptcp option value
            ip_pkt.getlayer(TCP).options = tcp_options[:-1]
            ip_pkt.getlayer(TCP).options.append(tuple(lst_opt))
            print "packet modified..." 
            print ip_pkt.getlayer(TCP).options
            del ip_pkt.chksum
            del ip_tcp.chksum
            ip_pkt.show2()
            packet.set_payload(str(ip_pkt))
        packet.accept()

    except Exception as e:
        packet.accept() #just skip the packet unmodified.

nfqueue = NetfilterQueue()
nfqueue.bind(1, modify)
try:
    print "[*] waiting for data"
    nfqueue.run()
except KeyboardInterrupt:
    nfqueue.unbind()
    print "Flushing iptables."
    os.system('iptables -F')
    os.system('iptables -X')

Modified packet:

###[ IP ]### 
  version   = 4
  ihl       = 5
  tos       = 0x0
  len       = 72
  id        = 30875
  flags     = DF
  frag      = 0
  ttl       = 63
  proto     = tcp
  chksum    = 0x2c49
  src       = 10.100.64.2
  dst       = 10.100.66.2
  \options   \
###[ TCP ]### 
     sport     = 43425
     dport     = 6060
     seq       = 599247764
     ack       = 3107215099
     dataofs   = 13
     reserved  = 0
     flags     = A
     window    = 502
     chksum    = 0xa93f
     urgptr    = 0
     options   = [('NOP', None), ('NOP', None), ('Timestamp', (763134406, 2876545223)), ('NOP', None), ('NOP', None), ('SAck', (3107215098, 3107215099)), (30, 'fallba')]
###[ Padding ]### 
        load      = 'ck please\x00\x00\x00'

In the above test client and server successfully created an MPTCP connection at the beginning but once I run my python script at the router both the server and the client can't talk to each other and starts to do a lot of retransmission (but my expectation was either server or client will recognise this mptcp option manipulation at the middlebox and fall back to TCP).

Please check out the Wireshark captures at 3 VMs to understand it better (sorry for the .zip format because I can't attach the .pcapng file in Github).

In the above python script, I'm changing the mptcp option value field but I also tried by dropping the whole mptcp option itself but I get the same result.

Just to verify the python script I did forward the packets without changing anything and it doesn't interrupt the client <-> server communication. So I'm doing something wrong with that packet modifying step. If someone has knowledge of this please guide me in the right direction.

wireshark_cap.zip

matttbe commented 3 years ago

Hi,

I didn't check everything but just thought about that: maybe easier to make sure MPTCP checksum is on and then modify data. The risk with modifying the option is to have the packet rejected.

If you don't see anything, best is to look at the SNMP counters (nstat) + /proc/net/mptcp_net/snmp.

VenkateswaranJ commented 3 years ago

I verified that checksum enabled for both client and server net.mptcp.mptcp_checksum = 1 but it's strange that when I check the SNMP counter I see only the client has the mptcp checksum flag set and the server doesn't set the checksum even though I set it in sysctl.

matttbe commented 3 years ago

It's possible the counter is only incremented for the sender, not sure. It seems we can see it in the traces.

Maybe best to edit your script to start by modifying only packets with data (not "pure" ACK). If you force the re-computation of the checksum, is it OK? Also if you modify other TCP fields? e.g. remove the push flag or decrease the window by one?

VenkateswaranJ commented 3 years ago

If you force the re-computation of the checksum, is it OK?

Sorry, I don't understand. I'm doing del ip.check_sum and del tcp.check_sum in scapy.

I tried several things other than the TCP window touching any other leads to packet loss and retransmission on both sides. But changing TCP windows doesn't affect anything.

Also as mentioned in RFC, I tried changing the actual message Raw.load, but that also lead to packet loss.

Application-level middleboxes such as content-aware firewalls may alter the payload within a subflow, such as rewriting URIs in HTTP traffic. MPTCP will detect these using the checksum and close the affected subflow(s) if other subflows can be used. If all subflows are affected, multipath will fall back to TCP, allowing such middleboxes to change the payload. MPTCP-aware middleboxes should be able to adjust the payload and MPTCP metadata in order not to break the connection.

Here is my final code.


#! /usr/bin/env python2.7
from scapy.all import *
from netfilterqueue import NetfilterQueue
import os

iptablesr = "iptables -A FORWARD -j NFQUEUE --queue-num 1"

os.system(iptablesr)
os.system("sysctl net.ipv4.ip_forward=1")

def check_mptcp_option(option_list):
    for option in option_list:
        if option[0] == 30:
            return True
    return False

def modify(packet):
    ip_pkt = IP(packet.get_payload())
    try:
        ip_tcp = ip_pkt.getlayer(TCP)
        if ip_tcp.flags != 'A' and check_mptcp_option(ip_tcp.options):
            ip_pkt.getlayer(Raw).load = ip_pkt.getlayer(Raw).load + "."
            del ip_pkt.chksum
            del ip_tcp.chksum
            packet.set_payload(str(ip_pkt))
        packet.accept()

    except Exception as e:
        packet.accept() #just skip the packet unmodified.

nfqueue = NetfilterQueue()
nfqueue.bind(1, modify)
try:
    print "[*] waiting for data"
    nfqueue.run()
except KeyboardInterrupt:
    nfqueue.unbind()
    print "Flushing iptables."
    os.system('iptables -F')
    os.system('iptables -X')

matttbe commented 3 years ago

Sorry, I don't understand. I'm doing del ip.check_sum and del tcp.check_sum in scapy.

Sorry I mean: if you only delete those, is the packet properly transferred and accepted?

I tried several things other than the TCP window touching any other leads to packet loss and retransmission on both sides. But changing TCP windows doesn't affect anything.

But do you see that the packet has been modified as expected in the traces?

Also as mentioned in RFC, I tried changing the actual message Raw.load, but that also lead to packet loss.

Yes of course you will have a fallback only if you have one subflow.

Here I see you modify the size of the data, maybe best not to but to modify the content. Or you will have to change the length of the packet.

Also maybe best not to check for flags != 'A' but check if the size of the payload is > 0 (I don't remember how to do that -- note that you can also add a filter in the iptables command). Sorry, I didn't use scapy for a while :-/

VenkateswaranJ commented 3 years ago

But do you see that the packet has been modified as expected in the traces? yes Here I see you modify the size of the data, may be best not to but to modify the content. Or you will have to change the length of the packet. Exactly! you are right, packet drop happens only if I change the packet length (I couldn't figure out how to change length in scapy), but I modified the MPTCP option field without changing the size and it falls back to infinite mapping mode.

In Infinite mapping mode, MPTCP closes all the subflows (except Master) and handover the connection to TCP, but I couldn't understand why we need to keep that single subflow and it even not doing any work (I see only ACK transfer happening in that last subflow), why don't we close that too and completely hand over the stuff to TCP? I tried to understand the infinite mapping described in RFC but I couldn't get it.

So in my test code, I was checking getsockopt() in the loop to see the MPTCP_ENABLED to go false when the connection falls back to infinite mapping but looks like MPTCP_ENABLED not set to false when the connection falls back to infinite mapping.

Another strange thing is, MPTCP_ENABLED is set to false on the server socket (even the mptcp connection exit between client <-> server) and on the client-side MPTCP_ENABLED works as expected. Why server-side socket set MPTCP_ENABLED to false? Test_client.txt Test_server.txt If you have some time, please check Test_server.txt (simple ping server) whether I'm polling getsockopt() in the wrong way.

matttbe commented 3 years ago

but I modified the MPTCP option field without changing the size and it falls back to infinite mapping mode.

I don't remember the details but I'm not sure we falls back to infinite mapping in all cases, e.g. if there are multiple subflows. Maybe we do that because that's really strange if only that has changed, it certainly means the other peer cannot properly handled MPTCP: a bug in the implementation, better to fallback directly.

why don't we close that too and completely hand over the stuff to TCP?

MPTCP tries to survive in a world where it could not be understood by most middleboxes. No need to reset everything is we can fallback to TCP.

So in my test code, I was checking getsockopt() in the loop to see the MPTCP_ENABLED to go false when the connection falls back to infinite mapping but looks like MPTCP_ENABLED not set to false when the connection falls back to infinite mapping.

Sorry, I'm not sure what you mean.

Why server-side socket set MPTCP_ENABLED to false?

Which socket do you look at? The "listened" socket or the "accepted" one?

VenkateswaranJ commented 3 years ago

Which socket do you look at? The "listened" socket or the "accepted" one? listened socket.

if I check accepted socket, it shows correct MPTCP_ENABLED.

matttbe commented 3 years ago

Which socket do you look at? The "listened" socket or the "accepted" one? listened socket.

if I check accepted socket, it shows correct MPTCP_ENABLED.

So everything works as expected, right? Can we close the ticket?

Or is there still something I missed from this sentence where it looks like you have two opposite things having the same behaviour:

So in my test code, I was checking getsockopt() in the loop to see the MPTCP_ENABLED to go false when the connection falls back to infinite mapping but looks like MPTCP_ENABLED not set to false when the connection falls back to infinite mapping.

VenkateswaranJ commented 3 years ago

Sorry for confusing sentences. To be clear :

Server side I created two socket, 1. listen 2. accept (comes from client) and I set MPTCP_ENABLED option on listen socket. Once the Mptcp connection established successfully , I tried to check MPTCP_ENABLED via getsockopt() on listen socket and it set MPTCP_ENABLED to false (why?) it should be true isn't it ? but if I check the getsockopt() on accept socket (which comes from client) shows MPTCP_ENABLED true (which is correct).

Client set MPTCP_ENABLED flag correctly but server not.

matttbe commented 3 years ago

Indeed, it seems not making sense to return false but on the other hand, this check is for established connections.

If your listen socket has MPTCP_ENABLED, it is certainly because your app sets it. This status doesn't depend on the network connection in this case. I don't think we need to extend MPTCP_ENABLED to cover non established connections.

Do you see a use-case? Or should we close this ticket?

VenkateswaranJ commented 3 years ago

Ah! now it makes sense.

VenkateswaranJ commented 3 years ago

@matttbe I found a way to drop the mptcp option using scapy script. After removing the mptcp option I need to recalculate IP and tcp length and checksum also need to set the data offset field in the TCP header. So now I can able remove the mptcp option from tcp header without dropping the packet. But I got into another problem which might be related to mptcp.

Test step :

mptcp

Created Mptcp connection successfully between client and server (with single subflow).
after some time, starting scapy script which drops mptcp option.
Keep on sending data from server to client.
I can see in Wireshark (checked on server-side) that server started send/receive using TCP but there is no FallBack packet. Also the problem is, in the client-side application not receiving the message sent from the server, but it looks like kernel acknowledges every packet to the server correctly.
In Dmesg I see the subflow still exist and it's not cleared out.

I thought when I drop the mptcp option it will fall pack to TCP but it's not happening, instead, it stuck at some intermittent state. Do you have any idea on this ?

VenkateswaranJ commented 3 years ago

Looks like the fall back to regular tcp (not infinite mapping) happens only before creating mptcp connection --> https://github.com/multipath-tcp/mptcp/blob/mptcp_v0.95/net/mptcp/mptcp_ctrl.c#L1090

In the rest of the cases, it mostly tries to go for infinite mapping and switch the path manager to default. As far as I tested in most of the time if I change the mptcp option in a well established connection it shows a "malformed MPTCP packet exception" error in the Wireshark log and some parsing errors in #dmesg but the real pain is it starts to flood the connection with MPTCP packet.

I couldn't find the MPTCP state machine code to check further.

VenkateswaranJ commented 3 years ago

If I do the exact same test with the upstream kernel version, the server sends RST to the client and close the connection.

multipath-tcp / mptcp

Simulate TCP fallback in the middle of MPTCP communication for testing #420