nokia / danm

TelCo grade network management in a Kubernetes cluster
BSD 3-Clause "New" or "Revised" License
374 stars 81 forks source link

CNI delegation failed due to error:Error delegating ADD to CNI plugin:sriov because:OS exec call faild:netplugin failed with no error message #264

Open nknkgithub opened 2 years ago

nknkgithub commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

bug

What happened:

We have sriov-o1c-host0 cluster network and 4 pods are using that CNI. 1st pod { "clusterNetwork": "sriov-o1c-host0", "ip6": "fdfb:6442:1:673::1", "proutes6": {"::0/0": "fdfb:6442:1:673::7fff"} }, 2nd pod { "clusterNetwork": "sriov-o1c-host0", "ip6": "fdfb:6442:1:673::2", "proutes6": {"::0/0": "fdfb:6442:1:673::7fff"} }, 3rd pod { "clusterNetwork": "sriov-o1c-host0", "ip": "10.0.106.65", "proutes": {"11.0.0.0/20": "10.0.106.72"} } 4th pod {"clusterNetwork": "sriov-o1c-host0", "ip": "10.0.106.72", "ip6": "fdfb:6442:1:673::7fff" },

But when we deploy pods one pod is in container creating state , and when pod is described below error is observed.

Warning FailedCreatePodSandBox 30m kubelet, controller-0 Failed to create pod sandbox: rpc error: code = Unknown desc = faled to setup network for sandbox "cd96d7ad7726d3ff73b4779eb98344eae19c95791cc26583652c5709922af75e": CNI network could not be set up: CNI operaton for network:sriov-o1c-host0 failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:sriov because:OS exec call faild:netplugin failed with no error message

What you expected to happen:

All the pods should be up and running

How to reproduce it:

Deploy pods with cluster network as shown above. One pod will not come up. This occurs sometimes

Anything else we need to know?:

Danm cleaner pods are running in the setup

kubectl get pods -A | grep -i danm-cleaner
kube-system               danm-cleaner-5dtgr                                                1/1     Running     0          99m

Environment:

- DANM configuration (K8s manifests, kubeconfig files, CNI config file):

cat /etc/cni/net.d/00-danm.conf { "cniVersion": "0.3.1", "name": "danm_meta_cni", "type": "danm", "kubeconfig": "/etc/cni/net.d/danm-kubeconfig", "cniDir": "/etc/cni/net.d", "namingScheme": "legacy" }

cat /etc/cni/net.d/danm-kubeconfig

apiVersion: v1 kind: Config current-context: default clusters:


- OS (e.g. from /etc/os-release):

cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"


- Kernel (e.g. `uname -a`):

uname -a Linux controller-0 3.10.0-1160.15.2.rt56.1152.el7.tis.4.x86_64 #1 SMP PREEMPT RT Wed Jun 9 20:40:45 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux



- Others:
nknkgithub commented 2 years ago

Help required

nknkgithub commented 2 years ago

Any updates regarding this? Want to know cause for the issue