Open jason-gideon opened 2 years ago
I print guid , it shows guid all 00. How to fix this?
n-MacBookPro:~/20-k8s-rdma-sriov/ib-sriov-cni/deployment/examples$ kubectl describe pod my-test-pod
Name: my-test-pod
Namespace: default
Priority: 0
Node: s-113-2-35/10.113.2.35
Start Time: Tue, 22 Nov 2022 22:02:12 +0800
Labels: <none>
Annotations: cni.projectcalico.org/containerID: dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
k8s.v1.cni.cncf.io/networks: [{"name": "ib-sriov-network"}]
Status: Pending
IP:
IPs: <none>
Containers:
my-test-ctr:
Container ID:
Image: mellanox/rping-test
Image ID:
Port: <none>
Host Port: <none>
Command:
sh
-c
sleep 1000000
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
mellanox.com/mlnx_sriov_rdma_ib: 1
Requests:
mellanox.com/mlnx_sriov_rdma_ib: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw2sr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-jw2sr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <invalid> default-scheduler Successfully assigned default/my-test-pod to s-113-2-35
Normal AddedInterface <invalid> multus Add eth0 [10.42.0.219/32] from k8s-pod-network
Warning FailedCreatePodSandBox <invalid> kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd" network for pod "my-test-pod": networkPlugin cni failed to set up pod "my-test-pod_default" network: [default/my-test-pod/:sriov-network]: error adding container to network "sriov-network": infiniBand SRI-OV CNI failed to configure VF "VF ib2 GUID is not valid, HardwareAddr:00:00:00:e7:fe:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00, guid:00:00:00:00:00:00:00:00", failed to clean up sandbox container "dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd" network for pod "my-test-pod": networkPlugin cni failed to teardown pod "my-test-pod_default" network: delegateDel: error invoking DelegateDel - "ib-sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/ib-sriov with name dc4a26cafbe5e8d9ab86f863ec42735061cf67593330b8cdf54eac56451f3bfd-net1]
Normal SandboxChanged <invalid> kubelet Pod sandbox changed, it will be killed and re-created.
I meet the same question; you need first config vf node GUID and port GUID, Then use the command ibdev2netdev -v
to check and display VF of status is up, and then you can use vf normally
Hey @zhutong196, Could you tell me how to configure the vf node GUID and port GUID?
I tried to create a pod with SRIOV net device (e.g. Mellanox IB), but the pod stuck in ContainerCreating. I configured 4 VFs on the IB interface of the host. I run device plugin pod and Multus CNI meta-plugin. but the SRIOV demo pod show ERROR
multus
./multus-daemonset-thick-plugin.yml:125: image: ghcr.io/k8snetworkplumbingwg/multus-cni:v3.9.2-thick-amd64
ERROR
The device plugin can detect the SRIOV net device on the host (node s-113-2-35 in my experiment), the output is shown in the following:
NAD
mutlus configmap
sriov device plugin