openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

icmpv6 jitter increase after upgrade #326

Open tiagonux opened 2 months ago

tiagonux commented 2 months ago

Hi all,

Bringing this thread discussion to here -> https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg09948.html that is reporting an issue regarding ICMP v6 packets.

While testing the upgrade path from OVN 22.03.1/OVS 2.17.2 to OVN 23.03.1/OVS 3.1.3 on Ubuntu 22.04/kernel 5.15 and 6.5 we are seeing a strange behavior for icmpv6 traffic. Before the upgrade a simple north-south or west-east ping between IPv6 hosts would have a low jitter like below:

64 bytes from 2001:db8:2:a::301: icmp_seq=2712 ttl=62 time=0.676 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2713 ttl=62 time=0.829 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2714 ttl=62 time=0.568 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2715 ttl=62 time=0.700 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2716 ttl=62 time=0.768 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2717 ttl=62 time=0.599 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2718 ttl=62 time=0.656 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2719 ttl=62 time=0.689 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2720 ttl=62 time=0.724 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2721 ttl=62 time=0.419 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2722 ttl=62 time=0.732 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2723 ttl=62 time=0.717 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2724 ttl=62 time=0.755 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2725 ttl=62 time=0.765 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2726 ttl=62 time=0.535 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2727 ttl=62 time=0.865 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2728 ttl=62 time=0.692 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2729 ttl=62 time=0.597 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2730 ttl=62 time=0.661 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=2731 ttl=62 time=0.558 ms

But after the upgrade, the same ping started to have a higher jitter:

64 bytes from 2001:db8:2:a::301: icmp_seq=37 ttl=253 time=2.14 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=38 ttl=253 time=50.2 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=39 ttl=253 time=57.0 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=40 ttl=253 time=61.5 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=41 ttl=253 time=2.16 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=42 ttl=253 time=1.68 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=43 ttl=253 time=1.63 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=44 ttl=253 time=3.32 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=45 ttl=253 time=1.87 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=46 ttl=253 time=39.6 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=47 ttl=253 time=2.87 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=48 ttl=253 time=60.0 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=49 ttl=253 time=1.79 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=50 ttl=253 time=2.06 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=51 ttl=253 time=2.45 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=52 ttl=253 time=2.10 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=53 ttl=253 time=4.39 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=54 ttl=253 time=2.91 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=55 ttl=253 time=1.79 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=56 ttl=253 time=1.80 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=57 ttl=253 time=2.26 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=58 ttl=253 time=55.1 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=59 ttl=253 time=57.2 ms 64 bytes from 2001:db8:2:a::301: icmp_seq=60 ttl=253 time=3.34 ms --- 2001:db8:2:a::301 ping statistics --- 60 packets transmitted, 60 received, 0% packet loss, time 59120ms rtt min/avg/max/mdev = 0.531/16.329/61.464/23.395 ms

The icmp v4 is not affected and we have the same jitter before and after the upgrade. Regarding throughput, I ran a TCP/UDP (v4/v6) throughput test before and after the upgrade and the numbers are similar, so it seems it happens only in special with icmpv6 traffic.

Checking the datapath, I can see the flow related with the in_port(1706) where the VM is connected being removed and installed again:

ovs-dpctl dump-flows | grep 2001:db8:2:a::301 recirc_id(0x1ad3d),tunnel(tun_id=0x1c,src=10.26.73.135,dst=10.26.72.4,tos=0x20,geneve({}{}),flags(-df+csum+key)),in_port(137),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:9b:b3:c6,dst=fa:16:3e:d7:c9:46),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:2:a::301,proto=58,hlimit=62,frag=no), packets:7, bytes:826, used:0.674s, actions:1706 recirc_id(0),tunnel(tun_id=0x1c,src=10.26.73.135,dst=10.26.72.4,tos=0x20,geneve({class=0x102,type=0x80,len=4,0x60008/0x7fffffff}),flags(-df+csum+key)),in_port(137),eth(src=fa:16:3e:9b:b3:c6,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:2:a::301,proto=58,hlimit=62,frag=no), packets:7, bytes:826, used:0.709s, actions:ct(zone=7185),recirc(0x1ad3d)

ovs-dpctl dump-flows | grep 2001:db8:2:a::301 recirc_id(0x1ad3d),tunnel(tun_id=0x1c,src=10.26.73.135,dst=10.26.72.4,tos=0x20,geneve({}{}),flags(-df+csum+key)),in_port(137),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:9b:b3:c6,dst=fa:16:3e:d7:c9:46),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:2:a::301,proto=58,hlimit=62,frag=no), packets:11, bytes:1298, used:0.190s, actions:1706 recirc_id(0),in_port(1706),eth(src=fa:16:3e:d7:c9:46,dst=fa:16:3e:9b:b3:c6),eth_type(0x86dd),ipv6(src=2001:db8:2:a::301,dst=2001:db8:1:2::10,proto=58,hlimit=255,frag=no),icmpv6(type=128/0xfc), packets:0, bytes:0, used:never, actions:ct(zone=7185),recirc(0x1b13c) recirc_id(0),tunnel(tun_id=0x1c,src=10.26.73.135,dst=10.26.72.4,tos=0x20,geneve({class=0x102,type=0x80,len=4,0x60008/0x7fffffff}),flags(-df+csum+key)),in_port(137),eth(src=fa:16:3e:9b:b3:c6,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:2:a::301,proto=58,hlimit=62,frag=no), packets:11, bytes:1298, used:0.237s, actions:ct(zone=7185),recirc(0x1ad3d)

(Note: no OVS HW Offloading)

So, it seems there is a flow missing, the flow goes to userspace and it is installed again on the datapath. Maybe it can explain the higher jitter.

After debugging and trying to understand when this behavior was introduced, we figured out the offending commit was this one [0]. We backported only this commit to the OVS 2.17.2 and the issue was reproduced.

The flow below is an example that is always installed and removed from the datapath and is left with 0 packets matched:

recirc_id(0),in_port(1706),eth(src=fa:16:3e:d7:c9:46,dst=fa:16:3e:9b:b3:c6),eth_type(0x86dd),ipv6(src=2001:db8:2:a::301,dst=2001:db8:1:2::10,proto=58,hlimit=255,frag=no),icmpv6(type=128/0xfc), packets:0, bytes:0, used:never, actions:ct(zone=7185),recirc(0xae46f4)

Since the commit changed the behavior of the classifier, this may have introduced an issue for ICMP v6 packets.

[0] https://github.com/openvswitch/ovs/commit/132fa24b656e1bc45b6ce8ee9ab0206fa6930f65

Regards,

Tiago Pires