nokia / danm

TelCo grade network management in a Kubernetes cluster
BSD 3-Clause "New" or "Revised" License
373 stars 81 forks source link

Unable to deploy the pod with SRIOV-VF's #251

Closed sriramec closed 3 years ago

sriramec commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

bug

What happened:

Unable to deploy the pod with sriov-vf's. Getting the errors like below, not able to attach pod to sriov-o1c network whose vlan is 452. 21/04/02 12:42:15.279546 ERROR: ADD: CNI network could not be set up with error:CNI operation for network:sriov-o1c failed with:failed to pop devices due to:devicePool is empty 2021/04/02 12:42:15.978986 CNI ADD invoked with: ns:default for Pod:lowell-3001cuup1-589456c8cb-trm8z CID: 0947c1b82999b651af18ab4751b3495b010cc5a091ebe57bc80bec97cbd0ef5c

What you expected to happen:

pod "lowell-3001cuup1-589456c8cb-trm8z" should have come up with the necessary vf's. There are 16 vfs in SRIOV device pool controller-0:/usr/libexec/cni# kubectl describe node controller-0 | grep -A5 -i "allocatable" Allocatable: cpu: 22 ephemeral-storage: 9391196145 hugepages-1Gi: 20Gi hugepages-2Mi: 0 intel.com/pci_sriov_net_physnet0: 16

controller-0:/usr/libexec/cni# ip link show enp101s0f0 4: enp101s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:cf:32:c8 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, vlan 21, spoof checking off, link-state auto, trust off, query_rss off vf 1 MAC 00:00:00:00:00:00, vlan 20, spoof checking off, link-state auto, trust off, query_rss off vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 4 MAC 00:00:00:00:00:00, vlan 453, spoof checking on, link-state auto, trust off, query_rss off vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 6 MAC 00:00:00:00:00:00, vlan 452, spoof checking off, link-state auto, trust off, query_rss off vf 7 MAC 00:00:00:00:00:00, vlan 453, spoof checking off, link-state auto, trust off, query_rss off vf 8 MAC 00:00:00:00:00:00, vlan 452, spoof checking off, link-state auto, trust off, query_rss off vf 9 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 10 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 11 MAC 00:00:00:00:00:00, vlan 22, spoof checking on, link-state auto, trust off, query_rss off vf 12 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 13 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 14 MAC 00:00:00:00:00:00, vlan 21, spoof checking on, link-state auto, trust off, query_rss off vf 15 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off controller-0:/usr/libexec/cni#

How to reproduce it:

Anything else we need to know?: Logs

2021/04/05 18:28:40.013615 CNI ADD invoked with: ns:default for Pod:lowell-3001cuup1-6899d899d7-tfl6c CID: 95c6c6820f51414f20f185c38e7d3626ac137faf5bf9a1b56047150166c4e9fe 2021/04/05 18:28:42.376953 CNI DEL invoked with: ns:default for Pod:lowell-3001cuup1-6899d899d7-tfl6c CID: 95c6c6820f51414f20f185c38e7d3626ac137faf5bf9a1b56047150166c4e9fe 2021/04/05 18:28:44.294338 ERROR: ADD: CNI network could not be set up with error:CNI operation for network:sriov-o1c failed with:failed to pop devices due to:devicePool is empty 2021/04/05 18:28:45.022850 CNI ADD invoked with: ns:default for Pod:lowell-3001cuup1-6899d899d7-tfl6c CID: d23a157605a2aba99e82750687e910868d79c707fa55c6dda38c36bb5139adf7

Environment:

  • OS (e.g. from /etc/os-release): controller-0:/usr/libexec/cni# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

Levovar commented 3 years ago

have you also requested an SR-IOV VF from the pool in your Pod spec's resources field?

the SR-IOV type of network connection just configures the assigned VFs, but you still need to ask for the physical device. The CNI sees that no devices were assigned to your Pod -cause I guess you haven't asked for any- so it returns the error cause there is nothing to configure VLANs, IPs etc. on

Levovar commented 3 years ago

https://github.com/nokia/danm/blob/master/example/device_plugin_demo/sriov_pod.yaml#L23

sriramec commented 3 years ago

Thanks for pointing it out.. Yes, it was a mistake in Yaml file, POD was attached to 4 networks, in resources field, there was a request for only 3 VFs.

sriramec commented 3 years ago

POD was attached to 4 networks, in resources field, there was a request for only 3 VFs.