k8snetworkplumbingwg / multus-cni

A CNI meta-plugin for multi-homed pods in Kubernetes
Apache License 2.0
2.41k stars 584 forks source link

SR-IOV & Bond CNI Fails to start and terminate pod #1303

Closed itsalexjones closed 1 month ago

itsalexjones commented 5 months ago

Hi Everyone,

I have deployed the SR-IOV CNI via the SR-IOV Network Device Plugin (v3.7.0) , and the bond CNI (from master, as the latest release is very old) manually and am trying to create a bond interface from two VFs in the pod. I have used examples from the bond-cni and sr-iov cni documentation to do this, and have previously had single SR-IOV interfaces working correctly.

What happend: When the pod is started the event Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "<snip>": plugin type ="multus" name="multus-cni-network" failed (add): [default/test-pod:sriov-network]: error adding container to network "sriov-network": cannot convert: no valid IP addresses is logged, and the pod fails to start.

When the pod is terminated, the event error killing pod: failed to "KillPodSandbox" for "<snip>" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"<snip>\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): delegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: invalid version \"\": the version is empty / delegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: invalid version \"\": the version is empty" is logged and the pod fails to be deleted.

What you expected to happen: All documentation suggests the pod should be started with the four interfaces as configured

How to reproduce it (as minimally and precisely as possible): Deploy the follwing three Network Attachment Definitions (assume the resources are already created):

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_PF_1
spec:
  config: '{
  "type": "sriov",
  "name": "sriov-network",
  "spoofchk":"off"
}'
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net2
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_PF_2
spec:
  config: '{
  "type": "sriov",
  "name": "sriov-network",
  "spoofchk":"off"
}'
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond-net1
spec:
  config: '{
  "type": "bond",
  "cniVersion": "0.3.1",
  "name": "bond-net1",
  "mode": "active-backup",
  "failOverMac": 1,
  "linksInContainer": true,
  "miimon": "100",
  "mtu": 1500,
  "links": [
     {"name": "net1"},
     {"name": "net2"}
  ],
  "ipam": {
    "type": "host-local",
    "subnet": "10.72.0.0/16",
    "rangeStart": "10.72.61.192",
    "rangeEnd": "10.72.61.255"
  }
}'

and the follwoing pod:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  annotations:
        k8s.v1.cni.cncf.io/networks: '[
{"name": "sriov-net1",
"interface": "net1"
},
{"name": "sriov-net2",
"interface": "net2"
},
{"name": "bond-net1",
"interface": "bond0"
}]'
spec:
  restartPolicy: Never
  containers:
  - name: bond-test
    image: alpine:latest
    command:
      - /bin/sh
      - "-c"
      - "sleep 60m"
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        intel.com/intel_sriov_PF_1: '1'
        intel.com/intel_sriov_PF_2: '1'
      limits:
        intel.com/intel_sriov_PF_1: '1'
        intel.com/intel_sriov_PF_2: '1'

Anything else we need to know?: If you assign an address to the two SR-IOV interfaces (a static address is fine), the pod is created correctly (but with two extra addresses on the bond slaves) - but the pod still fails to terminate.

Environment:

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.