Open antoninbas opened 2 months ago
Hmm, interesting. Thanks for the detailed report! I was able to reproduce the issue with the steps described.
Looks like an issue in the kernel module since I can't reproduce with userspace datapath. Looking into it.
Hi @antoninbas , can you please check what happens if you send two packets instead of one? Is the result the same for both?
I'd like to see if the problem is during the upcall or if it's the kernel action execution that goes wrong.
Never mind, reproduced it myself. and saw that, indeed, this only happens on the upcall path. Out of the 3 upcalls, they key seems to disappear between the second and the third one, i.e: in the flow_exec action.
I think we overwrite key->ipv6.ct_orig
with key->ipv6.nd
. They are part of the same union, and I see that we preserve the last 4 bytes of the IPv6 destination address. That matches with the size difference between ct_orig
(32 bytes) and nd
(28 bytes). This may happen when we pass down the key attributes from the userspace in case both ct original tuple and ND attributes are set. And we actually can pass both in case of IPv6 ICMP packets as a side effect of https://github.com/openvswitch/ovs/commit/132fa24b656e1bc45b6ce8ee9ab0206fa6930f65 . So, it might be related to https://github.com/openvswitch/ovs-issues/issues/326 as well.
We'll likely need to add an overwrite protection to the kernel, but we'll also need to fix that side effect, I think.
Testing the theory now.
OK, so it is an overwrite, but not related to https://github.com/openvswitch/ovs/commit/132fa24b656e1bc45b6ce8ee9ab0206fa6930f65 . The issue is that while passing the packet back to kernel for execute
command, kernel parses the packet and parsing overwrites the metadata here:
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 33b21a0c0548..ce50bcd3c991 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -561,7 +561,7 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
*/
key->tp.src = htons(icmp->icmp6_type);
key->tp.dst = htons(icmp->icmp6_code);
- memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));
+ //memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));
if (icmp->icmp6_code == 0 &&
(icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION ||
It's not an ND packet, so the nd
structure must not be cleared (it's in a union with ct_orig
). It should be safe to just remove this memset, because the flow key is always zeroed out during allocation, but I need to spend a bit more time to be sure. Will work on a proper patch.
@antoninbas , @amorenoz if you can test the kernel change above on your setups, would be great.
I guess, the following fix would be safer, we do need to clear the fields, but only for packets that are actually ND packets:
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 33b21a0c0548..8a848ce72e29 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -561,7 +561,6 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
*/
key->tp.src = htons(icmp->icmp6_type);
key->tp.dst = htons(icmp->icmp6_code);
- memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));
if (icmp->icmp6_code == 0 &&
(icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION ||
@@ -570,6 +569,8 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
struct nd_msg *nd;
int offset;
+ memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));
+
/* In order to process neighbor discovery options, we need the
* entire packet.
*/
@igsilya @amorenoz Thanks for the super quick replies and for looking into it. I appreciate it.
@igsilya I was able to validate your patch. I applied it on top of kernel 6.8.9, and I tested it in an Ubuntu VM. After installing the new kernel, and following the same steps I posted in my bug report, I was not able to reproduce the issue. Here are the logs I captured from the testcontroller:
2024-05-08T02:51:14Z|00088|poll_loop|DBG|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br0.mgmt) at ../lib/stream-fd.c:157
2024-05-08T02:51:14Z|00089|vconn|DBG|unix:/var/run/openvswitch/br0.mgmt: received: OFPT_PACKET_IN (OF1.5) (xid=0x0): table_id=2 cookie=0x0 total_len=74 ct_state=new|trk,ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2,ct_nw_proto=6,ct_tp_src=20,ct_tp_dst=80,ipv6,in_port=1 (via action) data_len=74 (unbuffered)
tcp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,ipv6_src=2001:db8:dead::1,ipv6_dst=2001:db8:dead::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=20,tp_dst=80,tcp_flags=syn tcp_csum:76ae
2024-05-08T02:51:18Z|00090|poll_loop|DBG|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br0.mgmt) at ../lib/stream-fd.c:157
2024-05-08T02:51:18Z|00091|vconn|DBG|unix:/var/run/openvswitch/br0.mgmt: received: OFPT_PACKET_IN (OF1.5) (xid=0x0): table_id=2 cookie=0x0 total_len=62 ct_state=new|trk,ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,ipv6,in_port=1 (via action) data_len=62 (unbuffered)
icmp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,ipv6_src=2001:db8:dead::1,ipv6_dst=2001:db8:dead::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=128,icmp_code=0 icmp6_csum:66ec
The first log is for the TCPv6 packet, and the second log is for the ICMPv6 Echo Request packet. As you can see the 2 fields are showing the correct values for the ICMPv6 packet: ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2
(same as for the TCPv6 packet).
Would you mind updating this issue after submitting your patch upstream, for tracking purposes?
@igsilya @amorenoz I have a follow up question as well: what's the exact scope of the bug? Is it possible for some invalid datapath flows to be installed because of this?
Would you mind updating this issue after submitting your patch upstream, for tracking purposes?
Sure. Posted for review: https://lore.kernel.org/netdev/20240509094228.1035477-1-i.maximets@ovn.org/T/#u
I have a follow up question as well: what's the exact scope of the bug? Is it possible for some invalid datapath flows to be installed because of this?
Technically, yes. But they should match packets only on rare occasions. The issue doesn't affect packets entering OVS datapath from kernel interfaces, only the packets injected from the userspace via execute
command. execute
is paired with put
that installs the datapath flow. The put
command is not affected, so the flow it installs will be correct.
However, if the action list for the execute
contains recirculation, the packet will be recirculated in the datapath and may potentially match the wrong flow if that flow happens to match on the broken content of ct tuple. If the matching datapath flow will not be found after recirculation, the MISS upcall will be sent to userspace including the broken ct tuple, then ovs-vswitchd may install another datapath flow matching the broken ct tuple. But no packet that doesn't go to userspace will have broken ct tuple, so these flows should normally not be matched. The exception may be traffic that always goes to userspace and back to kernel for one reason or another.
@igsilya Got it, thanks for the explanation. I do see this relevant sentence in your upstream patch as well:
The issue only affects the OVS_PACKET_CMD_EXECUTE path and doesn't affect packets entering OVS datapath from network interfaces, because in this case CT metadata is populated from skb after the packet is already parsed.
We ran into a surprising issue when using conntrack in OVS with ICMPv6 packets. When using the
ct
action on a packet and punting the forked packet to the controller, thect_ipv6_src
andct_ipv6_dst
match fields included in the metadata are incorrect, but only for ICMPv6 packets.To reproduce on a standalone OVS instance, I used the following flows:
To show the PacketIns, I used the
ovs-testcontroller
. Easiest way to reproduce is as follows:After that, you will need to inject the right packets. I used a veth pair and
scapy
to send packets from a separate network namespace, but I imagine there are plenty of options to achieve the same thing.The snippet below shows the Scapy commands used to craft and send the packet, and the PacketIn message logged by the
ovs-testcontroller
.Note that the ICMPv6 packet is a "ping", and not an NDP packet.
You can see above that for the ICMPv6 packet, we have
ct_ipv6_src=::,ct_ipv6_dst=::2
. This is not what we'd expect. We'd expect the same match value as for the TCPv6 packet:ct_ipv6_src=2001:db8:dead::1,ct_ipv6_dst=2001:db8:dead::2
.In this example, we don't commit the connection for the sake of simplicity. But note that if we were to commit the connection and use
conntrack -L
to dump the connection tracking table, we would get the excepted entry, with the correct origin tuple (this is something I verified). So this does seem to be specific to OVS.