openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

Invalid ct_ipv6_src / ct_ipv6_dst PacketIn match metadata for ICMPv6 #327

Open antoninbas opened 2 months ago

antoninbas commented 2 months ago

We ran into a surprising issue when using conntrack in OVS with ICMPv6 packets. When using the ct action on a packet and punting the forked packet to the controller, the ct_ipv6_src and ct_ipv6_dst match fields included in the metadata are incorrect, but only for ICMPv6 packets.

To reproduce on a standalone OVS instance, I used the following flows:

$ ovs-ofctl dump-flows br0 -O Openflow15 --no-stats
 priority=10,ip actions=ct(table=1)
 priority=10,ipv6 actions=ct(table=2)
 priority=0 actions=CONTROLLER:128
 table=1, priority=10,ct_state=+new+trk,ip actions=CONTROLLER:128
 table=2, priority=10,ct_state=+new+trk,ipv6 actions=CONTROLLER:128

To show the PacketIns, I used the ovs-testcontroller. Easiest way to reproduce is as follows:

cat <<EOF > flows.txt
table=0,priority=10,ip,actions=ct(table=1)
table=0,priority=10,ipv6,actions=ct(table=2)
table=1,priority=10,ip,ct_state=+new+trk,actions=CONTROLLER:128
table=2,priority=10,ipv6,ct_state=+new+trk,actions=CONTROLLER:128
EOF

ovs-vsctl add-br br0
ovs-testcontroller unix:/var/run/openvswitch/br0.mgmt -v -n --with-flows flows.txt

After that, you will need to inject the right packets. I used a veth pair and scapy to send packets from a separate network namespace, but I imagine there are plenty of options to achieve the same thing.

The snippet below shows the Scapy commands used to craft and send the packet, and the PacketIn message logged by the ovs-testcontroller.

# 1. ICMPv4 packet
>>> p1 = Ether()/IP(src="10.0.0.1",dst="10.0.0.2")/ICMP(type=8)
>>> sendp(p1, iface="veth0")

2024-05-07T03:51:37Z|00027|vconn|DBG|unix:/var/run/openvswitch/br0.mgmt: received: OFPT_PACKET_IN (OF1.5) (xid=0x0): table_id=1 cookie=0x0 total_len=42 ct_state=new|trk,ct_nw_src=10.0.0.1,ct_nw_dst=10.0.0.2,ct_nw_proto=1,ct_tp_src=8,ct_tp_dst=0,ip,in_port=1 (via action) data_len=42 (unbuffered)

# 2. TCPv6 packet
>>> p2 = Ether()/IPv6(src="2001:db8:dead::1",dst="2001:db8:dead::2")/TCP()
>>> sendp(p2, iface="veth0")

2024-05-07T03:53:13Z|00032|vconn|DBG|unix:/var/run/openvswitch/br0.mgmt: received: OFPT_PACKET_IN (OF1.5) (xid=0x0): table_id=2 cookie=0x0 total_len=74 ct_state=new|trk,ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2,ct_nw_proto=6,ct_tp_src=20,ct_tp_dst=80,ipv6,in_port=1 (via action) data_len=74 (unbuffered)

# 3. ICMPv6 packet
>>> p3 = Ether()/IPv6(src="2001:db8:dead::1",dst="2001:db8:dead::2")/ICMPv6EchoRequest()
>>> sendp(p3, iface="veth0")

2024-05-07T03:53:57Z|00037|vconn|DBG|unix:/var/run/openvswitch/br0.mgmt: received: OFPT_PACKET_IN (OF1.5) (xid=0x0): table_id=2 cookie=0x0 total_len=62 ct_state=new|trk,ct_ipv6_src=::,ct_ipv6_dst=::2,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,ipv6,in_port=1 (via action) data_len=62 (unbuffered)

Note that the ICMPv6 packet is a "ping", and not an NDP packet.

You can see above that for the ICMPv6 packet, we have ct_ipv6_src=::,ct_ipv6_dst=::2. This is not what we'd expect. We'd expect the same match value as for the TCPv6 packet: ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2.

In this example, we don't commit the connection for the sake of simplicity. But note that if we were to commit the connection and use conntrack -L to dump the connection tracking table, we would get the excepted entry, with the correct origin tuple (this is something I verified). So this does seem to be specific to OVS.

igsilya commented 2 months ago

Hmm, interesting. Thanks for the detailed report! I was able to reproduce the issue with the steps described.

Looks like an issue in the kernel module since I can't reproduce with userspace datapath. Looking into it.

amorenoz commented 2 months ago

Hi @antoninbas , can you please check what happens if you send two packets instead of one? Is the result the same for both?

I'd like to see if the problem is during the upcall or if it's the kernel action execution that goes wrong.

amorenoz commented 2 months ago

Never mind, reproduced it myself. and saw that, indeed, this only happens on the upcall path. Out of the 3 upcalls, they key seems to disappear between the second and the third one, i.e: in the flow_exec action.

igsilya commented 2 months ago

I think we overwrite key->ipv6.ct_orig with key->ipv6.nd. They are part of the same union, and I see that we preserve the last 4 bytes of the IPv6 destination address. That matches with the size difference between ct_orig (32 bytes) and nd (28 bytes). This may happen when we pass down the key attributes from the userspace in case both ct original tuple and ND attributes are set. And we actually can pass both in case of IPv6 ICMP packets as a side effect of https://github.com/openvswitch/ovs/commit/132fa24b656e1bc45b6ce8ee9ab0206fa6930f65 . So, it might be related to https://github.com/openvswitch/ovs-issues/issues/326 as well.

We'll likely need to add an overwrite protection to the kernel, but we'll also need to fix that side effect, I think.

Testing the theory now.

igsilya commented 2 months ago

OK, so it is an overwrite, but not related to https://github.com/openvswitch/ovs/commit/132fa24b656e1bc45b6ce8ee9ab0206fa6930f65 . The issue is that while passing the packet back to kernel for execute command, kernel parses the packet and parsing overwrites the metadata here:

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 33b21a0c0548..ce50bcd3c991 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -561,7 +561,7 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
         */
        key->tp.src = htons(icmp->icmp6_type);
        key->tp.dst = htons(icmp->icmp6_code);
-       memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));
+       //memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));

        if (icmp->icmp6_code == 0 &&
            (icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION ||

It's not an ND packet, so the nd structure must not be cleared (it's in a union with ct_orig). It should be safe to just remove this memset, because the flow key is always zeroed out during allocation, but I need to spend a bit more time to be sure. Will work on a proper patch.

@antoninbas , @amorenoz if you can test the kernel change above on your setups, would be great.

igsilya commented 2 months ago

I guess, the following fix would be safer, we do need to clear the fields, but only for packets that are actually ND packets:

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 33b21a0c0548..8a848ce72e29 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -561,7 +561,6 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
         */
        key->tp.src = htons(icmp->icmp6_type);
        key->tp.dst = htons(icmp->icmp6_code);
-       memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));

        if (icmp->icmp6_code == 0 &&
            (icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION ||
@@ -570,6 +569,8 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
                struct nd_msg *nd;
                int offset;

+               memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));
+
                /* In order to process neighbor discovery options, we need the
                 * entire packet.
                 */
antoninbas commented 2 months ago

@igsilya @amorenoz Thanks for the super quick replies and for looking into it. I appreciate it.

@igsilya I was able to validate your patch. I applied it on top of kernel 6.8.9, and I tested it in an Ubuntu VM. After installing the new kernel, and following the same steps I posted in my bug report, I was not able to reproduce the issue. Here are the logs I captured from the testcontroller:

2024-05-08T02:51:14Z|00088|poll_loop|DBG|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br0.mgmt) at ../lib/stream-fd.c:157
2024-05-08T02:51:14Z|00089|vconn|DBG|unix:/var/run/openvswitch/br0.mgmt: received: OFPT_PACKET_IN (OF1.5) (xid=0x0): table_id=2 cookie=0x0 total_len=74 ct_state=new|trk,ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2,ct_nw_proto=6,ct_tp_src=20,ct_tp_dst=80,ipv6,in_port=1 (via action) data_len=74 (unbuffered)
tcp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,ipv6_src=2001:db8:dead::1,ipv6_dst=2001:db8:dead::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=20,tp_dst=80,tcp_flags=syn tcp_csum:76ae
2024-05-08T02:51:18Z|00090|poll_loop|DBG|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br0.mgmt) at ../lib/stream-fd.c:157
2024-05-08T02:51:18Z|00091|vconn|DBG|unix:/var/run/openvswitch/br0.mgmt: received: OFPT_PACKET_IN (OF1.5) (xid=0x0): table_id=2 cookie=0x0 total_len=62 ct_state=new|trk,ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,ipv6,in_port=1 (via action) data_len=62 (unbuffered)
icmp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,ipv6_src=2001:db8:dead::1,ipv6_dst=2001:db8:dead::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=128,icmp_code=0 icmp6_csum:66ec

The first log is for the TCPv6 packet, and the second log is for the ICMPv6 Echo Request packet. As you can see the 2 fields are showing the correct values for the ICMPv6 packet: ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2 (same as for the TCPv6 packet).

Would you mind updating this issue after submitting your patch upstream, for tracking purposes?

antoninbas commented 2 months ago

@igsilya @amorenoz I have a follow up question as well: what's the exact scope of the bug? Is it possible for some invalid datapath flows to be installed because of this?

igsilya commented 2 months ago

Would you mind updating this issue after submitting your patch upstream, for tracking purposes?

Sure. Posted for review: https://lore.kernel.org/netdev/20240509094228.1035477-1-i.maximets@ovn.org/T/#u

I have a follow up question as well: what's the exact scope of the bug? Is it possible for some invalid datapath flows to be installed because of this?

Technically, yes. But they should match packets only on rare occasions. The issue doesn't affect packets entering OVS datapath from kernel interfaces, only the packets injected from the userspace via execute command. execute is paired with put that installs the datapath flow. The put command is not affected, so the flow it installs will be correct. However, if the action list for the execute contains recirculation, the packet will be recirculated in the datapath and may potentially match the wrong flow if that flow happens to match on the broken content of ct tuple. If the matching datapath flow will not be found after recirculation, the MISS upcall will be sent to userspace including the broken ct tuple, then ovs-vswitchd may install another datapath flow matching the broken ct tuple. But no packet that doesn't go to userspace will have broken ct tuple, so these flows should normally not be matched. The exception may be traffic that always goes to userspace and back to kernel for one reason or another.

antoninbas commented 2 months ago

@igsilya Got it, thanks for the explanation. I do see this relevant sentence in your upstream patch as well:

The issue only affects the OVS_PACKET_CMD_EXECUTE path and doesn't affect packets entering OVS datapath from network interfaces, because in this case CT metadata is populated from skb after the packet is already parsed.