openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

icmpv6 jitter increase after upgrade #326

Open tiagonux opened 6 months ago

tiagonux commented 6 months ago

Hi all,

Bringing this thread discussion to here -> https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg09948.html that is reporting an issue regarding ICMP v6 packets.

While testing the upgrade path from OVN 22.03.1/OVS 2.17.2 to OVN 23.03.1/OVS 3.1.3 on Ubuntu 22.04/kernel 5.15 and 6.5 we are seeing a strange behavior for icmpv6 traffic. Before the upgrade a simple north-south or west-east ping between IPv6 hosts would have a low jitter like below:

64 bytes from 2001:db8:2:a::301: icmp_seq=2712 ttl=62 time=0.676 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2713 ttl=62 time=0.829 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2714 ttl=62 time=0.568 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2715 ttl=62 time=0.700 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2716 ttl=62 time=0.768 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2717 ttl=62 time=0.599 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2718 ttl=62 time=0.656 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2719 ttl=62 time=0.689 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2720 ttl=62 time=0.724 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2721 ttl=62 time=0.419 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2722 ttl=62 time=0.732 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2723 ttl=62 time=0.717 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2724 ttl=62 time=0.755 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2725 ttl=62 time=0.765 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2726 ttl=62 time=0.535 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2727 ttl=62 time=0.865 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2728 ttl=62 time=0.692 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2729 ttl=62 time=0.597 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2730 ttl=62 time=0.661 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2731 ttl=62 time=0.558 ms

But after the upgrade, the same ping started to have a higher jitter:

64 bytes from 2001:db8:2:a::301: icmp_seq=37 ttl=253 time=2.14 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=38 ttl=253 time=50.2 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=39 ttl=253 time=57.0 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=40 ttl=253 time=61.5 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=41 ttl=253 time=2.16 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=42 ttl=253 time=1.68 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=43 ttl=253 time=1.63 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=44 ttl=253 time=3.32 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=45 ttl=253 time=1.87 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=46 ttl=253 time=39.6 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=47 ttl=253 time=2.87 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=48 ttl=253 time=60.0 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=49 ttl=253 time=1.79 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=50 ttl=253 time=2.06 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=51 ttl=253 time=2.45 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=52 ttl=253 time=2.10 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=53 ttl=253 time=4.39 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=54 ttl=253 time=2.91 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=55 ttl=253 time=1.79 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=56 ttl=253 time=1.80 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=57 ttl=253 time=2.26 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=58 ttl=253 time=55.1 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=59 ttl=253 time=57.2 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=60 ttl=253 time=3.34 ms --- 2001:db8:2:a::301 ping statistics --- 60 packets transmitted, 60 received, 0% packet loss, time 59120ms rtt min/avg/max/mdev = 0.531/16.329/61.464/23.395 ms

The icmp v4 is not affected and we have the same jitter before and after the upgrade. Regarding throughput, I ran a TCP/UDP (v4/v6) throughput test before and after the upgrade and the numbers are similar, so it seems it happens only in special with icmpv6 traffic.

Checking the datapath, I can see the flow related with the in_port(1706) where the VM is connected being removed and installed again:

ovs-dpctl dump-flows | grep 2001:db8:2:a::301 recirc_id(0x1ad3d),tunnel(tun_id=0x1c,src=10.26.73.135,dst=10.26.72.4,tos=0x20,geneve({}{}),flags(-df+csum+key)),in_port(137),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:9b:b3:c6,dst=fa:16:3e:d7:c9:46),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:2:a::301,proto=58,hlimit=62,frag=no), packets:7, bytes:826, used:0.674s, actions:1706 recirc_id(0),tunnel(tun_id=0x1c,src=10.26.73.135,dst=10.26.72.4,tos=0x20,geneve({class=0x102,type=0x80,len=4,0x60008/0x7fffffff}),flags(-df+csum+key)),in_port(137),eth(src=fa:16:3e:9b:b3:c6,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:2:a::301,proto=58,hlimit=62,frag=no), packets:7, bytes:826, used:0.709s, actions:ct(zone=7185),recirc(0x1ad3d)

ovs-dpctl dump-flows | grep 2001:db8:2:a::301 recirc_id(0x1ad3d),tunnel(tun_id=0x1c,src=10.26.73.135,dst=10.26.72.4,tos=0x20,geneve({}{}),flags(-df+csum+key)),in_port(137),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:9b:b3:c6,dst=fa:16:3e:d7:c9:46),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:2:a::301,proto=58,hlimit=62,frag=no), packets:11, bytes:1298, used:0.190s, actions:1706 recirc_id(0),in_port(1706),eth(src=fa:16:3e:d7:c9:46,dst=fa:16:3e:9b:b3:c6),eth_type(0x86dd),ipv6(src=2001:db8:2:a::301,dst=2001:db8:1:2::10,proto=58,hlimit=255,frag=no),icmpv6(type=128/0xfc), packets:0, bytes:0, used:never, actions:ct(zone=7185),recirc(0x1b13c) recirc_id(0),tunnel(tun_id=0x1c,src=10.26.73.135,dst=10.26.72.4,tos=0x20,geneve({class=0x102,type=0x80,len=4,0x60008/0x7fffffff}),flags(-df+csum+key)),in_port(137),eth(src=fa:16:3e:9b:b3:c6,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:2:a::301,proto=58,hlimit=62,frag=no), packets:11, bytes:1298, used:0.237s, actions:ct(zone=7185),recirc(0x1ad3d)

(Note: no OVS HW Offloading)

So, it seems there is a flow missing, the flow goes to userspace and it is installed again on the datapath. Maybe it can explain the higher jitter.

After debugging and trying to understand when this behavior was introduced, we figured out the offending commit was this one [0]. We backported only this commit to the OVS 2.17.2 and the issue was reproduced.

The flow below is an example that is always installed and removed from the datapath and is left with 0 packets matched:

recirc_id(0),in_port(1706),eth(src=fa:16:3e:d7:c9:46,dst=fa:16:3e:9b:b3:c6),eth_type(0x86dd),ipv6(src=2001:db8:2:a::301,dst=2001:db8:1:2::10,proto=58,hlimit=255,frag=no),icmpv6(type=128/0xfc), packets:0, bytes:0, used:never, actions:ct(zone=7185),recirc(0xae46f4)

Since the commit changed the behavior of the classifier, this may have introduced an issue for ICMP v6 packets.

[0] https://github.com/openvswitch/ovs/commit/132fa24b656e1bc45b6ce8ee9ab0206fa6930f65

Regards,

Tiago Pires

tiagonux commented 3 months ago

Hey @igsilya

I created this reproducer[0] where you can reproduce this issue. Could you take a look?

[0] https://pastebin.com/Qw0wGyJ3

Regards,

Tiago Pires

tiagonux commented 3 months ago

Hi,

This reproducer when using the option "run", it will create 90 LRs, LSs, NAT rules for namespaces. While it is running, there is a ping for v4 and v6 running in paralelal and the output is saved to individual files into the /tmp/*. For exemple, when using OVS 3.x I can see the v6's ping working but the flow related with it on the datapath is flapping. And regarding the v4's ping, the flow is installed on the datapath and it still there until the ping is finalized, so that shows the issue only affect the ICMP v6 packets.

Tiago Pires

igsilya commented 3 months ago

Thanks @tiagonux ! I'll try to check it later this week. One question: Are you running it on Ubuntu? If so, which version?

tiagonux commented 3 months ago

Hey @igsilya,

I'm running on Ubuntu 22.04(jammy).

Thanks

Tiago Pires