k8snetworkplumbingwg / sriov-cni

DPDK & SR-IOV CNI plugin
Apache License 2.0
298 stars 147 forks source link

sriov-cni del doesn't work when there is altName on the interace #250

Closed SchSeba closed 4 months ago

SchSeba commented 1 year ago

I found this one on RHEL 9.2 but I think it effects other distros

net1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c2:f5:9e:e2:05:a2 brd ff:ff:ff:ff:ff:ff
altname enp216s0f1v9
altname ens3f1v9

and when we remove a pod and the sriov-cni tries to move back the nic and rename it we get this error

0s Warning FailedKillPod pod/client error killing pod: failed to "KillPodSandbox" for "56fd1e57-c9f3-4aae-8ab7-3f6564e0c675" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_client_seba_56fd1e57-c9f3-4aae-8ab7-3f6564e0c675_0(8b0f6b4a6d8d639e72d0d27b41719389f002e26e6e723c019c716ff43abdb5e1): error removing pod seba_client from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): DelegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: failed to move interface ens3f1v6 to init netns: file exists"

and the nic is moved to the host with the wrong name

69: dev69: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c2:f5:9e:e2:05:a2 brd ff:ff:ff:ff:ff:ff
altname enp216s0f1v9
altname ens3f1v9

because of the altname a new pod can start but that is not the right way.
SchSeba commented 1 year ago

first step is to add the altName to netlink I already open a PR for that

https://github.com/vishvananda/netlink/pull/862

subeditara commented 6 months ago

I found same issue while pod being killed, and stuck on dead loop.

First StopPodSandbox error: msg="StopPodSandbox for \"48161e41da530ba0d7933911fe2c7f827265521591e03e2084743a3fa032526b\" failed" error="failed to destroy network for sandbox \"48161e41da530ba0d7933911fe2c7f827265521591e03e2084743a3fa032526b\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): delegateDel: error invoking ConflistDel - \"sriov-o1c-host0\": conflistDel: error in getting result from DelNetworkList: failed to move interface enp1s17f5 to init netns: file exists"

rest of StopPodSandbox error (dead loop) msg="StopPodSandbox for \"48161e41da530ba0d7933911fe2c7f827265521591e03e2084743a3fa032526b\" failed" error="failed to destroy network for sandbox \"48161e41da530ba0d7933911fe2c7f827265521591e03e2084743a3fa032526b\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): delegateDel: error invoking ConflistDel - \"sriov-o1c-host0\": conflistDel: error in getting result from DelNetworkList: Failed to get link neto1c: Link not found"

Since first StopPodSandbox failed, next attempts of StopPodSandbox, again calls sriov-cni plugin to release sriov interface, and now it don't find the interface (neto1c) (as already been renamed to enp1s17f5 from first StopPodSandbox) and throws error: Failed to get link neto1c: Link not found. Successive StopPodSandbox keep throwing this same error.

Stuck pod shows: 12: enp1s17f5: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether 0a:d7:02:03:00:06 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 1652 22 0 0 0 22 TX: bytes packets errors dropped carrier collsns 892 10 0 0 0 0 altname enp1s17f5

neto1c is not on the pod and enp1s17f5 is not on host namespace.

Duplicate "altname" and "interface name" lead to the "file exists" error while trying to move the pod interface to host namespace.

Replication of problem, with minimal steps: 1) Option A: create tuntap interface ip tuntap add mode tap enp138s1f3 Option B: There is VF interface with name enp138s1f3 2) create network namespace ip netns add myns 3) mock SetupVF ip link set enp138s1f3 name temp12 #LinkSetName, rename to temp IF name ip link property add dev temp12 altname enp138s1f3 #add altname, setupVF is not doing this. who is setting ? ip link set temp12 netns myns # move interface to netns ip netns exec myns ip link set temp12 name neto1c #Set Pod IF name ip netns exec myns ip link 12: neto1c: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether d2:9d:7b:9e:9d:58 brd ff:ff:ff:ff:ff:ff altname enp138s1f3 4) mock ReleaseVF ip netns exec myns ip link set neto1c name enp138s1f3 #LinkSetName, rename device ip netns exec myns ip link set enp138s1f3 netns 1 #LinkSetNsFd, move device to init netns RTNETLINK answers: File exists <----------- This is exact issue. 5) After error, ip netns exec myns ip link 12: enp138s1f3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether d2:9d:7b:9e:9d:58 brd ff:ff:ff:ff:ff:ff altname enp138s1f3 <------------error is because of duplicate interface name and altname.

ip netns exec myns ip link property del dev enp138s1f3 altname enp138s1f3
Option A: no error Option B: RTNETLINK answers: Invalid argument

Option B: ip netns del myns ip l # shows: dev57 interface name with altname enp138s1f3 ip link property del dev dev57 altname enp138s1f3 ip l # shows: dev57 interface without altname ip link set dev57 name enp138s1f3

This proves we are getting this problem because "interface name" get renamed such that it become duplicate to "altname". Should ReleaseVF remove altname before trying to rename the interface back to original name ?

How do we get altname at first place ? ethtool -i ens787f0 driver: ixgbe version: 5.10.0-6-amd64 firmware-version: 0x80001375, 1.2877.0 expansion-rom-version: bus-info: 0000:54:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes

echo 1 > /sys/class/net/ens787f0/device/sriov_numvfs ip l # shows vf 0 under ens787f0 and also shows new enp84s16 interface (without altname) ip link set enp84s16 name temp12 ip l # shows temp12 with altname enp84s16

The above "ip link set enp84s16 name temp12" is just supposed to rename interface but it is not supposed to add altname. Point is if interface gets altname somehow, sriov-cni should be able to handle the cleanup anyway.

subeditara commented 6 months ago

More comment on how do we get altname at first place ?

The feature AlternativeNamesPolicy allows to create alternate names for existing interfaces. It is space-separated list of policies by which the interface's alternative names should be set. Each of the policies may fail, and all successful policies are used. The available policies are "database", "onboard", "slot", "path", and "mac". If the kernel does not support the alternative names, then this setting will be ignored.

"database": The name is set based on entries in the udev's Hardware Database with the key "ID_NET_NAME_FROM_DATABASE". "onboard": The name is set based on information given by the firmware for on-board devices, as exported by the udev property "ID_NET_NAME_ONBOARD". "slot": The name is set based on information given by the firmware for hot-plug devices, as exported by the udev property "ID_NET_NAME_SLOT". "path": The name is set based on the device's physical location, as exported by the udev property "ID_NET_NAME_PATH". "mac": The name is set based on the device's persistent MAC address, as exported by the udev property "ID_NET_NAME_MAC".

uname -a 5.10.0-6-amd64

systemd --version systemd 247

udevadm info /sys/class/net/enp59s17f7 ID_NET_NAME_PATH=enp59s17f7 ID_MODEL_FROM_DATABASE=Ethernet Virtual Function 700 Series ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link

/usr/lib/systemd/network/99-default.link [Link] NamePolicy=keep kernel database onboard slot path AlternativeNamesPolicy=database onboard slot path MACAddressPolicy=persistent

** At this point, "ip link set enp59s17f7 name temp12" sets "altname" too. This is because of match of "path" policy of AlternativeNamesPolicy i.e. existence of udev property ID_NET_NAME_PATH.

In altname presence, sriov-cni should handle the cleanup properly.

SchSeba commented 6 months ago

Hi @subeditara thanks for the information can you please add the platform you are using? and also the version of the systemd you have installed on your system

subeditara commented 6 months ago

kernel: 5.10.0-6-amd64 Debian (Bullseye) systemd: 247.3-7

SchSeba commented 5 months ago

Hi @subeditara thanks for the comment!

so looks like the problem was fixed in systemd version 254 I will try to continue and push the PR here that should just remove the alt-name if exist