nokia / danm

TelCo grade network management in a Kubernetes cluster
BSD 3-Clause "New" or "Revised" License
373 stars 81 forks source link

spoof check is turning on automatically while using vf's from mellanox nic #246

Closed sriramec closed 3 years ago

sriramec commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

bug feature

What happened:

I have deployed two pods, "netcat pod" and "ipsec pod". "netcat pod" attaches to sriov-f1c(ipv6) network, "ipsec pod" attaches to sriov-f1c(ipv6) network and sriovipsec(ipv4) network. Idea is to route the ipv6 packets originated from netcat pod to ipsec pod for ipsec encryption. When I start pinging ipv6 packets from netcat pod towards some ipv6 network, they are all routed to ipsec pod, from there it goes inside the tunnel. From some reasons, if I restart the pods and pods are attached to the same vfs they were using earlier, my ping test fails. To overcome this problem, I have to turn off spoof check on those vfs. I m looking out for any configuration from DANM which can turn off the spoof check permanantly.

What you expected to happen:

Some configuration from DANM which can turn off the spoof check

How to reproduce it: sriov-f1c network

apiVersion: danm.k8s.io/v1 kind: ClusterNetwork metadata: name: sriov-f1c spec: NetworkID: sriov-f1c NetworkType: sriov Options: device_pool: "intel.com/pci_sriov_net_physnet0" net6: 2001:4000:aa:0a::/64 allocation_pool_v6: start: 2001:4000:00aa:000a::0001 end: 2001:4000:00aa:000a::000a container_prefix: mhf1cif rt_tables: 10 vlan: 10

sriovipsec network

apiVersion: danm.k8s.io/v1 kind: ClusterNetwork metadata: name: sriovipsec spec: NetworkID: sriovipsec NetworkType: sriov Options: device_pool: "intel.com/pci_sriov_net_physnet0" cidr: 10.207.2.0/24 allocation_pool: start: 10.207.2.1 end: 10.207.2.11 rt_tables: 219 container_prefix: ipsecif vlan: 452

netcat pod

apiVersion: apps/v1 kind: Deployment metadata: labels: app: netcat name: netcat spec: selector: matchLabels: networkdeployapp: danmnetapp strategy: type: Recreate replicas: 1 template: metadata: labels: networkdeployapp: danmnetapp fhnetwork: danmnetdufhapp mhnetwork: danmnetdumhapp bhoamnetwork: danmnetdubhoamapp annotations: danm.k8s.io/interfaces: | [ { "clusterNetwork":"default", "ip": "dynamic" }, { "clusterNetwork":"sriov-f1c", "ip6": "2001:4000:aa:a ::1", "proutes6": {"2001:0:0:1 ::201/64": "2001:4000:aa:a ::a"} } ] spec: nodeSelector: "kubernetes.io/hostname": "controller-0" containers:

  • image: registry.local:9001/netcat:latest command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 30; done;" ] resources: requests: cpu: "1" memory: 500M intel.com/pci_sriov_net_physnet0: '1' limits: cpu: "1" memory: 500M intel.com/pci_sriov_net_physnet0: '1' imagePullPolicy: Always name: centos securityContext: privileged: true restartPolicy: Always

ipsecpod

apiVersion: apps/v1 kind: Deployment metadata: name: ipsec-deploy labels: app: ipsec-deploy spec: strategy: type: Recreate replicas: 1 selector: matchLabels: app: ipsecpod template: metadata: labels: app: ipsecpod annotations: danm.k8s.io/interfaces: | [ { "clusterNetwork":"default", "ip": "dynamic" }, { "clusterNetwork":"sriov-f1c", "ip6": "2001:4000:aa:a ::a"}, { "clusterNetwork":"sriovipsec", "ip": "10.207.2.1", "proutes": {"10.222.222.0/24": "10.207.2.254"}} ] spec: nodeSelector: "kubernetes.io/hostname": "controller-0" serviceAccountName: ipsecsvcaccount containers:

  • name: ipsec image: registry.local:9001/ipsec:latest imagePullPolicy: Always ports:
  • containerPort: 50051
  • containerPort: 50050 resources: requests: intel.com/pci_sriov_net_physnet0: '2' limits: intel.com/pci_sriov_net_physnet0: '2' env:
  • name: IPSEC_LISTEN_PORT value: "50051"
  • name: CERT_RENEWAL_ADDRS value: "localhost:50050"
  • name: CMPV2_DIAL_ADDRS value: "nwsvcservicecmpv2:6565"
  • name: IPSEC_DIAL_ADDRS value: "localhost:50051"
  • name: CONFIG_DB_ADDRS value: "nwsvcserviceconfigdb:50055"
  • name: DATA_PLANE_INTF value: "eth0"
  • name: CHILD_LOCAL_TS value: "2001:4000:aa:a ::/64,10.207.4.0/24"
  • name: CHILD_REMOTE_TS value: "2001:0:0:1::/64,10.231.101.0/24" command: [ "/bin/sh", "-c", "--" ] args: [ "while true; do sleep 30; done;" ] securityContext: privileged: true capabilities: add: ["NET_ADMIN", "SYS_TIME"]

"ip a s" output in netcat

sh-4.4# ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 4: eth0@if605: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default link/ether 56:d2:42:c9:e1:f4 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.16.192.112/32 scope global eth0 valid_lft forever preferred_lft forever 25: mhf1cif: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 6a:33:43:76:6d:44 brd ff:ff:ff:ff:ff:ff inet6 2001:4000:aa:a::1/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::6833:43ff:fe76:6d44/64 scope link valid_lft forever preferred_lft forever

"ip a s" in ipsec pod

~/ipsec # ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 4: eth0@if617: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default link/ether 3a:37:99:78:1e:52 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.16.192.82/32 scope global eth0 valid_lft forever preferred_lft forever 34: mhf1cif: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 2e:42:9f:3e:79:f1 brd ff:ff:ff:ff:ff:ff inet6 2001:4000:aa:a::a/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::2c42:9fff:fe3e:79f1/64 scope link valid_lft forever preferred_lft forever 36: ipsecif: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 3a:e2:8b:96:b0:36 brd ff:ff:ff:ff:ff:ff inet 10.207.2.1/24 brd 10.207.2.255 scope global ipsecif valid_lft forever preferred_lft forever

ping from netcat pod to ipsec pod

sh-4.4# ping6 2001:4000:aa:a::a -I mhf1cif PING 2001:4000:aa:a::a(2001:4000:aa:a::a) from 2001:4000:aa:a::1 mhf1cif: 56 data bytes From 2001:4000:aa:a::1: icmp_seq=1 Destination unreachable: Address unreachable From 2001:4000:aa:a::1: icmp_seq=2 Destination unreachable: Address unreachable From 2001:4000:aa:a::1: icmp_seq=3 Destination unreachable: Address unreachable From 2001:4000:aa:a::1: icmp_seq=4 Destination unreachable: Address unreachable From 2001:4000:aa:a::1: icmp_seq=5 Destination unreachable: Address unreachable From 2001:4000:aa:a::1: icmp_seq=6 Destination unreachable: Address unreachable From 2001:4000:aa:a::1: icmp_seq=7 Destination unreachable: Address unreachable

ip link output in host

4: enp101s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:cf:32:c8 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, vlan 10, spoof checking off, link-state auto, trust off, query_rss off vf 1 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 2 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 3 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 4 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 5 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 6 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 7 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 8 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 9 MAC 00:00:00:00:00:00, vlan 10, spoof checking on, link-state auto, trust off, query_rss off vf 10 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 11 MAC 00:00:00:00:00:00, vlan 452, spoof checking off, link-state auto, trust off, query_rss off vf 12 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 13 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 14 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 15 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 16 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 17 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 18 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 19 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 20 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 21 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 22 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 23 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 24 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 25 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 26 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 27 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 28 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 29 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 30 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 31 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off

Turn off the spoof check

controller-0:~# cat set_spoof_off.sh COUNT=0 VF_NUM=32 set_spoof_off() { while [ ${COUNT} -lt ${VF_NUM} ] do ip link set enp101s0f0 vf ${COUNT} spoofchk off COUNT=$(( ${COUNT} + 1 )) done } set_spoof_off

Run the spoof check script ./set_spoof_off.sh

4: enp101s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:cf:32:c8 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, vlan 10, spoof checking off, link-state auto, trust off, query_rss off vf 1 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 2 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 3 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 4 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 5 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 6 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 7 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 8 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 9 MAC 00:00:00:00:00:00, vlan 10, spoof checking off, link-state auto, trust off, query_rss off

Now ping

sh-4.4# ping6 2001:4000:aa:a::a -I mhf1cif PING 2001:4000:aa:a::a(2001:4000:aa:a::a) from 2001:4000:aa:a::1 mhf1cif: 56 data bytes 64 bytes from 2001:4000:aa:a::a: icmp_seq=1 ttl=64 time=0.199 ms 64 bytes from 2001:4000:aa:a::a: icmp_seq=2 ttl=64 time=0.059 ms 64 bytes from 2001:4000:aa:a::a: icmp_seq=3 ttl=64 time=0.054 ms

ping works.

So the question is how to keep this spoof check off always.

Anything else we need to know?:

Environment:

Levovar commented 3 years ago

DANM doesn't modify the attributes of the VFs. it is done by the SR-IOV CNI.

DANM doesn't set spoof check parameter in the SR-IOV config: https://github.com/nokia/danm/blob/master/pkg/cnidel/cniconfs.go#L41 if I tell the SR-IOV CNI to not do anything with the spoof check parameter, I expect it to be the case. based on code it should be: https://github.com/k8snetworkplumbingwg/sriov-cni/blob/master/pkg/sriov/sriov.go#L334

so I'm not sure where and why exactly the CNI is changing this attribute, but it is an error there. what's the version of the SR-IOV CNI you are using? maybe try updateing it to the latest version

Levovar commented 3 years ago

@sriramec any updates?

sriramec commented 3 years ago

Hi,

This seems to be a mellanox nic specific issue. With all the s/w versions being same, when I tried on Intel NICs, this issue is not observed. So, I have raised a ticket on Mellanox . Thanks for all the support.

Regards, Sriram

Levovar commented 3 years ago

BTW this comment: https://github.com/k8snetworkplumbingwg/sriov-cni/pull/114/files#r391395504 sounds like the issue you describe so make sure the CNI version you use contains this PR