nokia / danm

TelCo grade network management in a Kubernetes cluster
BSD 3-Clause "New" or "Revised" License
373 stars 81 forks source link

Pod creation fails when requesting vfio-pci bound resource via SRIOV CNI, as DANM unable to setup dummy kernel interface for the device #231

Closed superfix906 closed 4 years ago

superfix906 commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

bug

What happened:

Network CNI could not be setup for SRIOV, when the allocated resource is a vfio-pci bound device. Fails in creation of dummy interface, with error : "cannot create dummy interface for DPDK because:cannot assign requested address"

What you expected to happen:

Network CNI should have been setup and and pod requesting DPDK (vfio-pci) interface should have started, with a dummy kernel interface in Pods' n/w namespace.

How to reproduce it:

Install DANM in lightweight mode using the installer job, once all services are running, launch danmNet and pod with requests for a vfio-pci bound interface, via SRIOV CNI

Anything else we need to know?:

Am using flannel for IPV4 based cluster networking, danm is installed as per the installer job document in lightweight mode, all danm services are up and running. Am able to create a pod with SRIOV as CNI when the resource is bound to kernel/netdevice, and even IPAM is able to allocate IP for the same. The same is not true, when the resource is bound to vfio-pci driver, the CNI setup fails to create the dummy kernel interface, with the following error message :

Events: Type Reason Age From Message Normal Scheduled default-scheduler Successfully assigned example/app to test Warning FailedCreatePodSandBox 2s kubelet, test Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ee0d160e99d6bb410d8b75d2fef6f0f546811537598ab1282c8b2cd29e8cf925" network for pod "app": networkPlugin cni failed to set up pod "app_example" network: CNI network could not be set up: CNI operation for network:sriov-vfio failed with:Post-processing failed for interface:eth1 because:failed to create dummy kernel interface for eth1 because:cannot create dummy interface for DPDK because:cannot assign requested address Normal SandboxChanged 2s kubelet, test Pod sandbox changed, it will be killed and re-created.

POD yaml

apiVersion: v1 kind: Pod metadata: name: app namespace: example labels: env: test annotations: danm.k8s.io/interfaces: | [ {"network":"management", "ip":"dynamic"}, {"network":"sriov-vfio", "ip":"dynamic"} ] spec: containers: - name: sriov-pod image: centos:latest args:

  • sleep
  • "10000" resources: requests: intel.com/sriov_vfio_vf: '1' limits: intel.com/sriov_vfio_vf: '1'

DanmNet Yaml

apiVersion: danm.k8s.io/v1 kind: DanmNet metadata: name: management namespace: example spec: NetworkID: 10-flannel NetworkType: flannel --- apiVersion: danm.k8s.io/v1 kind: DanmNet metadata: name: sriov-vfio namespace: example spec: NetworkID: sriov-vfio NetworkType: sriov Options: device_pool: "intel.com/sriov_vfio_vf" cidr: 10.1.20.0/24

SRIOV resources

{ "cpu": "48", "ephemeral-storage": "280411618864", "hugepages-1Gi": "17Gi", "intel.com/sriov_dpdk_vf": "0", "intel.com/sriov_fec_vf": "1", "intel.com/sriov_netdevicevf": "15", **"intel.com/sriov_vfiovf": "1",** "memory": "79530372Ki", "pods": "110" }

Environment:

superfix906 commented 4 years ago

To add some more information, I tried similar with dpdk's 'igb_uio' driver and was able to make it work, the dummy interface creation was successful, unlike the case of 'vfio-pci'. So this issue is specifically for devices bound to 'vfio-pci' driver alone. Any help on this will be appreciated, as vfio-pci is the way we want to move ahead. Thanks in advance !

Levovar commented 4 years ago

interesting issue cause i explicitly tested the scenario and it was working for me :) so actually the error you see is coming from here: https://github.com/nokia/danm/blob/181255ef463e930aa221aecd9f50f014a4e760b2/pkg/danmep/ep.go#L285

at this point the IP address is actually not yet added to the link, we only set its MAC address! so the error is 1: coming from the kernel 2: must be MAC clash related, not IP

I can only think of two things why this can happen:

but TBH my money is on the old kernel

Levovar commented 4 years ago

@superfix906 so managed to retest this recently with 82599 NICs (which model you are using BTW?), on CentOS 7.8 with 4.18 kernel it works fine in all scenarios, with or without VLAN tag in the network. but one thing I noticed when VLAN is also used in the network we add the VF MAC address to both the dummy, and the VLAN interface on top of it my kernel could tolerate it, but maybe the older ones could not? I made this change to address it: https://github.com/nokia/danm/pull/234 , but as you did not use VLAN tag in your network this is prob not the root cause

in any case, we did encounter such an error you describe in our evnrionment, but it only happened when DANM was asked to work with improperly setup VFs (binding to VFIO was not properly done before the Pod was created) Considering the feature can be reliably used in our environment I strongly think the root cause is environment specific, and possibly related to either your kernel, or to improper device management in the host layer

Levovar commented 4 years ago

further debugged the problem. the error possibly appears when the MAC address of the VF is full zero. the kernel refuses to set it on the dummy interface this can happen with some Intel drivers. the referenced PR now adds check for zero MAC, and only tries to set it on the dummy if it is a valid one, which should solve the problem it is currently unclear whether the Intel drivers zero out both admin and effective MACs, or in some cases the SR-IOV CNI fails to properly reset the VF after use, because I did observe VFIO bound VFs to sometimes have MAC addresses, and sometimes don't. So it is still kind of a mystery, but nevertheless whatever happens on the host level DANM will now behave more resilient :)

superfix906 commented 4 years ago

@Levovar Thanks a lot for the detailed research and inputs. Appreciate that !

Unfortunately, we have digressed from this at the moment. Shall update once we back at this again. Thanks again.

Levovar commented 4 years ago

@superfix906 no problem :) meanwhile we have tested the change in our own environment, and it solves the reported problem so I will close the ticket

thanks again for reporting the case!

krsna1729 commented 4 years ago

we found setting the mac address apriori as part of node/device setup is better than leaving it zero mac. This prevents creation of random mac when DPDK enumerates the VFs for some models.

https://github.com/clearlinux/cloud-native-setup/blob/e74b3ca892ea04ec293d37e35f9815505141792e/clr-k8s-examples/9-multi-network/systemd/sriov.sh#L47