ovn-org / ovn

Open Virtual Network
Apache License 2.0
516 stars 254 forks source link

ICMP6 time exceeded in-transit response not sent from OVN router when source is external #78

Open tomponline opened 3 years ago

tomponline commented 3 years ago

We've noticed that ICMP6 time exceeded in-transit response packets are not sent from the intermediate OVN logical router hop when an MTR trace is started from an external source address targeted at an IP inside the OVN logical network (all NAT disabled).

If the source address is changed to an IP inside the same subnet as the OVN router's external port address then the OVN router responds with the expected ICMP6 time exceeded in-transit response.

Example setup:

Container connected to a bridge called "lxdbr1" with an IP of fd42:cfe0:8030:65c4:216:3eff:fe07:96a3/64.

An OVN network behind an OVN logical router, with a provider port connected to another bridge called "lxdbr0". The OVN logical router port's IP address on the "lxdbr0" subnet is fd42:6610:36a8:1234:216:3eff:fe72:513b (a different /64 than lxdbr1).

The OVN router also has a static route added to it to route fd0b:bc39:4820:d4f8::/64 to a container connected to a logical switch port. This container has the IP setup of fd0b:bc39:4820:d4f8::1/64.

On the Linux host running the two bridges, there is also a static route of fd0b:bc39:4820:d4f8::1/64 configured to route to the OVN's external IP of fd42:6610:36a8:1234:216:3eff:fe72:513b.

If I run an MTR trace from the container connected to lxdbr1, with a source address of fd42:cfe0:8030:65c4:216:3eff:fe07:96a3/64 to the OVN container's address of fd0b:bc39:4820:d4f8::1/64, then we see this result:

                            My traceroute  [v0.93]
c1 (fd42:cfe0:8030:65c4:216:3eff:fe07:96a3)           2020-12-10T12:32:21+0000
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                      Packets               Pings
 Host                               Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. fd42:cfe0:8030:65c4::1           0.0%    11    0.2   0.2   0.1   0.2   0.0
 2. (waiting for reply)
 3. fd0b:bc39:4820:d4f8::1           0.0%    10    0.2   0.3   0.2   1.1   0.3

The OVN router in the intermediate hop is not responding with the ICMP6 time exceeded in-transit response. A tcpdump trace shows two ICMP6 echo requests, one for the OVN router and one of the target IP, but only 1 reply packet coming from the target IP, and no response from the OVN router.

sudo tcpdump -i lxdbr0 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lxdbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
12:34:16.659989 IP6 fd42:cfe0:8030:65c4:216:3eff:fe07:96a3 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33020, length 24
12:34:16.994012 IP6 fd42:cfe0:8030:65c4:216:3eff:fe07:96a3 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33021, length 24
12:34:16.994779 IP6 fd0b:bc39:4820:d4f8::1 > fd42:cfe0:8030:65c4:216:3eff:fe07:96a3: ICMP6, echo reply, seq 33021, length 24
12:34:17.662156 IP6 fd42:cfe0:8030:65c4:216:3eff:fe07:96a3 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33023, length 24
12:34:17.996195 IP6 fd42:cfe0:8030:65c4:216:3eff:fe07:96a3 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33024, length 24
12:34:17.996258 IP6 fd0b:bc39:4820:d4f8::1 > fd42:cfe0:8030:65c4:216:3eff:fe07:96a3: ICMP6, echo reply, seq 33024, length 24
12:34:18.664205 IP6 fd42:cfe0:8030:65c4:216:3eff:fe07:96a3 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33026, length 24
12:34:18.998078 IP6 fd42:cfe0:8030:65c4:216:3eff:fe07:96a3 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33027, length 24
12:34:18.998118 IP6 fd0b:bc39:4820:d4f8::1 > fd42:cfe0:8030:65c4:216:3eff:fe07:96a3: ICMP6, echo reply, seq 33027, length 24

However if I then setup an IPv6 NAT rule on lxdbr1 to translate the source to the host's IP on lxdbr0 (fd42:6610:36a8:1234::1) then we start seeing the expected responses:

                                     My traceroute  [v0.93]
c1 (fd42:cfe0:8030:65c4:216:3eff:fe07:96a3)                               2020-12-10T12:35:34+0000
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                          Packets               Pings
 Host                                                   Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. fd42:cfe0:8030:65c4::1                               0.0%    17    0.1   0.2   0.1   0.2   0.0
 2. fd42:6610:36a8:1234:216:3eff:fe72:513b               0.0%    17    1.1   1.0   0.4   1.5   0.3
 3. fd0b:bc39:4820:d4f8::1                               0.0%    16    0.2   0.2   0.1   1.1   0.2

And a tcpdump shows the expected replies:

sudo tcpdump -i lxdbr0 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lxdbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
12:35:52.782198 IP6 fd42:6610:36a8:1234::1 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33104, length 24
12:35:52.783417 IP6 fd42:6610:36a8:1234:216:3eff:fe72:513b > fd42:6610:36a8:1234::1: ICMP6, time exceeded in-transit for fd0b:bc39:4820:d4f8::1, length 72
12:35:53.116093 IP6 fd42:6610:36a8:1234::1 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33105, length 24
12:35:53.116787 IP6 fd0b:bc39:4820:d4f8::1 > fd42:6610:36a8:1234::1: ICMP6, echo reply, seq 33105, length 24
12:35:53.784050 IP6 fd42:6610:36a8:1234::1 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33107, length 24
12:35:53.785148 IP6 fd42:6610:36a8:1234:216:3eff:fe72:513b > fd42:6610:36a8:1234::1: ICMP6, time exceeded in-transit for fd0b:bc39:4820:d4f8::1, length 72
12:35:54.117792 IP6 fd42:6610:36a8:1234::1 > fd0b:bc39:4820:d4f8::1: ICMP6, echo request, seq 33108, length 24
12:35:54.117833 IP6 fd0b:bc39:4820:d4f8::1 > fd42:6610:36a8:1234::1: ICMP6, echo reply, seq 33108, length 24

So we were wondering is this a bug, or is this expected behaviour?

stgraber commented 3 years ago

We suspect this issue is causing other recent problems around PMTU.

When the OVN network runs with a lowered PMTU, the lack of proper ICMP6 response from the gateway means that the client will not receive the needed fragmentation required response and their packets just end up getting lost whenever they exceed the OVN network MTU.

This has been an issue when providing network for ANYCAST based services as for those a common trick is to lower the MTU to 1280 which then avoids a number of issues with clients using tunnels and the services not receiving the fragmentation packets due to ECMP routing them to a different server (see https://blog.cloudflare.com/path-mtu-discovery-in-practice/)

fnordahl commented 3 years ago

For the PMTU part of this issue, does it make a difference to use OVN 21.09 and is the gateway_mtu option set on the Logical Router Port?

Support for generating ICMP fragmentation needed for ingress traffic was recently added and is included in 21.09.

Generation of ICMP fragmentation needed packets for traffic originating inside OVN has been there a while and both require the gateway_mtu option to be set.

tomponline commented 3 years ago

Thanks @fnordahl! I was not aware of the gateway_mtu setting as its not documented in the ovn-nb (https://manpages.ubuntu.com/manpages/focal/en/man5/ovn-nb.5.html) docs like the other options. For some reason its only mentioned in the ovn-northd docs (https://manpages.ubuntu.com/manpages/focal/man8/ovn-northd.8.html) (even though from what I can see it needs to be applied to the northbound DB like the other options). Confusing :(

Is there a recommended way to discover options such as this, I went looking for an "mtu" like setting like this first by scouring the ovn-nb manpage. In your view do you think that options such as this should be also mentioned in ovn-nb?

Thanks again.

fnordahl commented 3 years ago

I would indeed consider this as CMS API and as such I think it should be mentioned in the ovn-nb documentation.

fnordahl commented 3 years ago

I sent a patch amending the ovn-nb documentation to include this option for review: https://patchwork.ozlabs.org/project/ovn/patch/20211109101158.887655-1-frode.nordahl@canonical.com/

tomponline commented 3 years ago

Amazing, thanks so much. I am testing on a later version of OVN to see if that helps with enabling PMTU.

tomponline commented 3 years ago

21.09.

@fnordahl Using Ubuntu Impish with the gateway_mtu option worked well for PTMU discovery (https://github.com/lxc/lxd/pull/9503).

I still need to test if that resolves the original problem reported here of no exceeded in transit message.

tomponline commented 3 years ago

@fnordahl I've tested the original reported problem with Ubuntu Impish and the issue still continues I'm afraid. No ICMP6 time exceeded in-transit response from the OVN router when packets are sent from external with a source address not in the router's external port's subnet.

fnordahl commented 3 years ago

Ack, thanks for checking @tomponline. At least we got the PMTU part of the problem out of the way! I'll have a look and see what could be done about the other part.

tomponline commented 3 years ago

Ack, thanks for checking @tomponline. At least we got the PMTU part of the problem out of the way! I'll have a look and see what could be done about the other part.

Yes that will help @stgraber setup. Do you think that can be backported to Focal?

stgraber commented 3 years ago

@tomponline that's actually a question for @fnordahl as he and his team maintain the focal packages.

In my case, I maintain OVS/OVN backports in my production PPA for my own clusters so I'll make sure I get on 21.09 there soon. The snap itself currently doesn't ship 21.09 as it wouldn't build for some reason, but I'll sort that out soon. It's also not really relevant to the snap as we only use the OVN client tools from there whereas we really need the server side updated in this case.

tomponline commented 3 years ago

@tomponline that's actually a question for @fnordahl as he and his team maintain the focal packages.

In my case, I maintain OVS/OVN backports in my production PPA for my own clusters so I'll make sure I get on 21.09 there soon. The snap itself currently doesn't ship 21.09 as it wouldn't build for some reason, but I'll sort that out soon. It's also not really relevant to the snap as we only use the OVN client tools from there whereas we really need the server side updated in this case.

Yes I originally tested using the snap plus focal host and it didn't work (as expected).

fnordahl commented 3 years ago

We have had other parties trip over the missing PMTU from external network as noted in https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1949120, so an SRU is surely something for the OVN team in Canonical to consider. It is non-trivial though as some code re-organization took place leading up to 21.09, but we'll have a look.

fnordahl commented 3 years ago

We have had other parties trip over the missing PMTU from external network as noted in https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1949120, so an SRU is surely something for the OVN team in Canonical to consider. It is non-trivial though as some code re-organization took place leading up to 21.09, but we'll have a look.

Forgot to mention: On Focal an alternative is to consume the OVN 21.09 packages from the Xena UCA (i.e. set source='cloud:focal-xena' for charms or issue sudo add-apt-repository cloud-archive:xena), would that be an option here?

tomponline commented 3 years ago

Xena UCA

Thanks that may be an option for, although I may also be able to use @stgraber PPA