Closed peng-isi closed 4 years ago
@peng-isi you might want to try ib-sriov-cni (the cni for infiniband sriov devices), instead of sriov-cni
@zshi-redhat thanks. Your suggestion works!
@zshi-redhat thanks. Your suggestion works!
Cool, glad to hear it worked!
What issue would you like to bring attention to?
I tried to create a pod with SRIOV net device (e.g. Mellanox IB), but the pod stuck in ContainerCreating. I configured 4 VFs on the IB interface of the host. I run device plugin pod and Multus CNI meta-plugin.
The device plugin can detect the SRIOV net device on the host (hp6 in my experiment), the output is shown in the following:
1.# kubectl get node hp6 -o json | jq '.status.allocatable' { "cpu": "16", "devices.kubevirt.io/kvm": "110", "devices.kubevirt.io/tun": "110", "devices.kubevirt.io/vhost-net": "110", "ephemeral-storage": "530610004728", "hugepages-1Gi": "0", "hugepages-2Mi": "0", "mellanox.com/mlnx_sriov_netdevice": "4", "memory": "49382120Ki", "pods": "110" }
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: sriov-net1 annotations: k8s.v1.cni.cncf.io/resourceName: mellanox.com/mlnx_sriov_netdevice spec: config: '{ "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-network", "ipam": { "type": "host-local", "subnet": "10.56.217.0/24", "routes": [{ "dst": "0.0.0.0/0" }], "gateway": "10.56.217.1" } }'
The pod is created as follows:
apiVersion: v1 kind: Pod metadata: name: testpod1 annotations: k8s.v1.cni.cncf.io/networks: sriov-net1 spec: containers:
The log for the pod (testpod) is appended as follows:
[hp]# kubectl describe pod testpod Name: testpod1 Namespace: default Priority: 0 Node: hp6/10.13.206.66 Start Time: Thu, 06 Aug 2020 23:32:03 -0400 Labels:
Annotations: k8s.v1.cni.cncf.io/networks: sriov-net1
Status: Pending
IP:
Containers:
appcntr1:
Container ID:
Host Port:
Command:
/bin/bash
-c
IPs:
Image: centos/tools Image ID:
Port:
Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-p7lqf: Type: Secret (a volume populated by a Secret) SecretName: default-token-p7lqf Optional: false QoS Class: BestEffort Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
Normal Scheduled 5s default-scheduler Successfully assigned default/testpod1 to hp6 Warning FailedCreatePodSandBox kubelet, hp6 Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "32e2882a38d1fd4bfbc80effa237a4f3fed390597fd1148524db7775f0036fd7" network for pod "testpod1": networkPlugin cni failed to set up pod "testpod1_default" network: Multus: [default/testpod1]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to find vf 3", failed to clean up sandbox container "32e2882a38d1fd4bfbc80effa237a4f3fed390597fd1148524db7775f0036fd7" network for pod "testpod1": networkPlugin cni failed to teardown pod "testpod1_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name 32e2882a38d1fd4bfbc80effa237a4f3fed390597fd1148524db7775f0036fd7-net1]
Warning FailedCreatePodSandBox kubelet, hp6 Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "a45d61595967926346dc3853301d6f5a1aeaffbe2ecb9bbd026063bd13fff643" network for pod "testpod1": networkPlugin cni failed to set up pod "testpod1_default" network: Multus: [default/testpod1]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to find vf 3", failed to clean up sandbox container "a45d61595967926346dc3853301d6f5a1aeaffbe2ecb9bbd026063bd13fff643" network for pod "testpod1": networkPlugin cni failed to teardown pod "testpod1_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name a45d61595967926346dc3853301d6f5a1aeaffbe2ecb9bbd026063bd13fff643-net1]
Normal SandboxChanged (x2 over ) kubelet, hp6 Pod sandbox changed, it will be killed and re-created.
What is the impact of this issue?
Do you have a proposed response or remediation for the issue?