ovn-org / ovn-kubernetes

A robust Kubernetes networking platform
https://ovn-kubernetes.io/
Apache License 2.0
821 stars 342 forks source link

HW offload can not work #839

Open autumn0207 opened 5 years ago

autumn0207 commented 5 years ago

ovs log:

2019-09-23T03:14:48.824Z|00035|dpif_netlink(handler2)|ERR|failed to offload flow: Operation not supported: 621b9db0ed0b6aa 2019-09-23T03:14:56.222Z|00069|dpif_netlink(revalidator11)|ERR|Dropped 2 log messages in last 8 seconds (most recently, 8 seconds ago) due to excessive rate 2019-09-23T03:14:56.222Z|00070|dpif_netlink(revalidator11)|ERR|failed to offload flow: Operation not supported: 621b9db0ed0b6aa

dump-flows: 2019-09-23T03:15:37Z|00001|dpif_netlink|INFO|The kernel module does not support meters. recirc_id(0x18),in_port(3),ct_state(-new-est-rel-rpl-inv-trk),ct_label(0/0x1),eth(dst=0a:00:00:a8:00:05),eth_type(0x0800),ipv4(dst=192.168.0.4,frag=no), packets:768, bytes:75264, used:0.046s, actions:3 recirc_id(0x1c),in_port(3),ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=0a:00:00:a8:00:05,dst=00:00:00:e2:93:e3),eth_type(0x0800),ipv4(src=192.168.0.4,dst=192.168.0.1,proto=1,ttl=64,frag=no),icmp(type=8,code=0), packets:3, bytes:294, used:0.047s, actions:userspace(pid=4294149776,slow_path(action)) recirc_id(0),in_port(3),ct_state(-new-est-rel-rpl-inv-trk),ct_label(0/0x1),eth(src=0a:00:00:a8:00:05,dst=00:00:00:e2:93:e3),eth_type(0x0806),arp(sip=192.168.0.4,tip=192.168.0.1,op=1/0xff,sha=0a:00:00:a8:00:05,tha=00:00:00:00:00:00), packets:33, bytes:1980, used:4.045s, actions:userspace(pid=4294149776,slow_path(action)) recirc_id(0x19),in_port(3),ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no), packets:767, bytes:75166, used:0.048s, actions:ct(zone=1,nat),recirc(0x1a) recirc_id(0x1a),in_port(3),eth(dst=00:00:00:e2:93:e3),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0xf8), packets:3, bytes:294, used:0.048s, actions:ct(zone=1),recirc(0x1b) recirc_id(0x1b),in_port(3),ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),eth(),eth_type(0x0800),ipv4(src=192.168.0.4/255.255.255.252,frag=no), packets:767, bytes:75166, used:0.048s, actions:ct(zone=1,nat),recirc(0x1c) recirc_id(0),in_port(3),eth(src=0a:00:00:a8:00:05),eth_type(0x0800),ipv4(src=192.168.0.4,dst=128.0.0.0/128.0.0.0,proto=1,frag=no),icmp(type=8/0xf8), packets:3, bytes:294, used:0.048s, actions:ct(zone=1),recirc(0x19)

hw: mellanox cx5 os: centos 7.6 ovs version: 2.12

girishmg commented 5 years ago

@moshe010 can you please take a look.

As I understand, the CX5 driver doesn't yet support connection tracking and NATing yet, so such flows will not be offloaded to the NIC.

moshe010 commented 5 years ago

@autumn0207, You need connection tracking tc offload in ovs [1] that patches are still under review.

[1] https://www.mail-archive.com/ovs-dev@openvswitch.org/msg34124.html

autumn0207 commented 5 years ago

@moshe010 @girishmg thank you very much. But there is another error in ovs logs:

2019-09-23T07:31:45.257Z|00006|dpif_netlink(handler66)|ERR|failed to offload flow: Invalid argument: 621b9db0ed0b6aa 2019-09-23T07:31:45.635Z|00001|dpif_netlink(handler67)|ERR|failed to offload flow: Invalid argument: 621b9db0ed0b6aa 2019-09-23T07:31:47.199Z|00005|dpif_netlink(handler71)|ERR|failed to offload flow: Invalid argument: 621b9db0ed0b6aa 2019-09-23T07:32:03.707Z|00002|dpif_netlink(handler67)|ERR|failed to offload flow: Invalid argument: enp216s0_0 2019-09-23T07:32:04.250Z|00006|dpif_netlink(handler71)|ERR|failed to offload flow: Invalid argument: enp216s0_0 2019-09-23T07:32:05.252Z|00007|dpif_netlink(handler71)|ERR|failed to offload flow: Invalid argument: enp216s0_0

enp216s0_0 and 621b9db0ed0b6aa are vf repos

moshe010 commented 5 years ago

@autumn0207, It hard to say from the log. If connectivity is working but not offloaded, I prefer that all the pieces in ovs and kernel will be merged first and then we can continue. Is the problem that it is not offloaded? Does connectivity between pods works?

autumn0207 commented 5 years ago

@moshe010 Yes, the connectivity can working normally, i just dont understand why any flows can not be offloaded

BntumBle commented 4 years ago

Hello, I have encountered the same problem. How did you solve it?

moshe010 commented 4 years ago

@BntumBle, The ovs needs connection tracking offload to work. in ovs it already merged [1] in ovs 2.13. The kernel is still work in progress, so once all patches we land in the kernel offload will work

[1] - https://github.com/openvswitch/ovs/commit/576126a931cdf96d43443916d922462c7a16e350

BntumBle commented 4 years ago

@moshe010 so you mean that the ovs offload has not been successful yet, and some kernel changes are needed?

BntumBle commented 4 years ago

@moshe010 I use the ovs 2.9.5 to offload a simple forward flow , but I failed and the ovs-vswitchd log shows error "failed to offload flow: Invalid argument". Does this need connection tracking offload?

moshe010 commented 4 years ago

@BntumBle, you need ovs 2.13 and from the kernel I don't think all the patches merged. but just to do tc software (no offload, byt goes via tc) you need kernel 5.5.9 latest upstream

BntumBle commented 4 years ago

@moshe010 but I am following this guide to configure Open vSwitch Hardware offload https://help.netronome.com/support/solutions/articles/36000081172-agilio-open-vswitch-tc-user-guide#document-07_Using_openvswitch it says kernel 4.15- is ok .

moshe010 commented 4 years ago

it ok for very basic offloads like push/pop vlan or vxlan ecap/decp. ovn-kubernetes relay on connection tracking feature which offload introduce much later on

BntumBle commented 4 years ago

@moshe010 Thank you for your reply , Then if I just perform a basic offloads(just a one port in and one port out), does it necessarily to use ovs 2.13, ovs 2.9.5 is ok?

moshe010 commented 4 years ago

yes but you need a basic CNI such [1] see [2]

[1] - https://github.com/kubevirt/ovs-cni [2] - https://github.com/kubevirt/ovs-cni/blob/master/docs/ovs-offload.md

azure-lio commented 3 years ago

@autumn0207

2019-09-23T07:31:45.257Z|00006|dpif_netlink(handler66)|ERR|failed to offload flow: Invalid argument: 621b9db0ed0b6aa 2019-09-23T07:31:45.635Z|00001|dpif_netlink(handler67)|ERR|failed to offload flow: Invalid argument: 621b9db0ed0b6aa 2019-09-23T07:31:47.199Z|00005|dpif_netlink(handler71)|ERR|failed to offload flow: Invalid argument: 621b9db0ed0b6aa 2019-09-23T07:32:03.707Z|00002|dpif_netlink(handler67)|ERR|failed to offload flow: Invalid argument: enp216s0_0 2019-09-23T07:32:04.250Z|00006|dpif_netlink(handler71)|ERR|failed to offload flow: Invalid argument: enp216s0_0 2019-09-23T07:32:05.252Z|00007|dpif_netlink(handler71)|ERR|failed to offload flow: Invalid argument: enp216s0_0

enp216s0_0 and 621b9db0ed0b6aa are vf repos

Have you solved this problem? I build a enviroment as the openstack reference ,but also I meet the same error as yours. My linux is centos 7.8, kernel is 3.10.0-957.el7.x86_64,ovs 2.11 ,NIC is Mellanox cx5

offload referenece: https://docs.openstack.org/neutron/queens/admin/config-ovs-offload.html

moshe010 commented 3 years ago

you need connection tracking support. It in kernel 5.7 or above and ovs 2.13 see https://github.com/ovn-org/ovn-kubernetes/blob/master/docs/ovs_offload.md

azure-lio commented 3 years ago

@moshe010 Thanks a lot for your reply.I am not clear whether the kenerl >=4.13 is necessary, because Someone told me they have used the Centos 7.6 with linux kernal 3.10.0-957.el7.x86_64,and ovs 2.11 to offload vxlan encap and decap to the Mellanox CX5 successfully. And also there are some Mellanox support Applications notes give an successful example with Centos7.5 : https://www.mellanox.com/related-docs/prod_software/Mellanox_Support_for_TripleO_Rocky_Application_Notes_v1.1.pdf So I am a little confused with these references.

The errors in ovs-vswitd.log: 2020-12-28T09:34:17.138Z|01842|dpif_netlink(handler131)|DBG|system@ovs-system: put[create] ufid:2da53e35-d5ad-4245-a4b7-af99af4c7f52 recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(5),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=fa:16:3e:e3:fa:16,dst=fa:16:3e:66:c1:4f),eth_type(0x0806),arp(sip=192.168.1.4/0.0.0.0,tip=192.168.1.10/0.0.0.0,op=2/0,sha=fa:16:3e:e3:fa:16/00:00:00:00:00:00,tha=fa:16:3e:66:c1:4f/00:00:00:00:00:00), actions:set(tunnel(tun_id=0x17,src=192.168.20.242,dst=192.168.20.241,ttl=64,tp_dst=4789,flags(df|key))),4 2020-12-28T09:34:17.138Z|01843|dpif_netlink(handler131)|ERR|failed to offload flow: Invalid argument: eth0

The ct values in flow are all zero.

moshe010 commented 3 years ago

kernel 4.13 was support for just vxlan without security groups (security groups uses connection tracking). ovn kubernetes need geneve and connection tracking to work (there is no way to disable it like it is in openstack). The flow that you are sawing is arp which is not offloaded anyway so it ok it failed to offloaded it. with connection tracking we can offload only tcp and udp traffic. ICMP will not be offloaded as well.