openshift / sriov-network-operator

SR-IOV Network Operator
Apache License 2.0
119 stars 106 forks source link

SR-IOV Network Operator 4.15.0-202410010035 | when setting linkType: IB the NIC get filtered out | #1027

Open bbenshab opened 1 month ago

bbenshab commented 1 month ago

when setting: linkType: IB on a SriovNetworkNodePolicy like in this example:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlnx-port-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  ibverbs: true
  isRdma: true
  linkType: IB
  nicSelector:
    pfNames:
    - ibs3f0
    vendor: 15b3
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: 'true'
  numVfs: 1
  priority: 99
  rdma: true
  resourceName: port1

the NIC gets filtered out as shown below openshift.io/port1= 0

oc get node intel-perf-27.perf.eng.bos2.dc.redhat.com -o json | jq .status.allocatable
{
  "cpu": "127500m",
  "ephemeral-storage": "213881594729",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "526694060Ki",
  "nvidia.com/gpu": "2",
  "openshift.io/port1": "0",
  "pods": "250",
  "rdma/rdma_shared_device_a": "63"
}

the only workaround I found is to edit the config map: oc edit configmap -n openshift-sriov-network-operator device-plugin-config

and then removing: "linkTypes":["Infiniband"],

from:

apiVersion: v1
data:
  intel-perf-27.perf.eng.bos2.dc.redhat.com: '{"resourceList":[{"resourceName":"port1","selectors":{"vendors":["15b3"],"pfNames":["ibs3f0"],"linkTypes":["infiniband"],"IsRdma":true,"NeedVhostNet":false},"SelectorObj":null},{"resourceName":"port2","selectors":{"vendors":["15b3"],"pfNames":["ibs3f1"],"linkTypes":["infiniband"],"IsRdma":true,"NeedVhostNet":false},"SelectorObj":null}]}'
  perf-intel-6.perf.eng.bos2.dc.redhat.com: '{"resourceList":[{"resourceName":"port1","selectors":{"vendors":["15b3"],"pfNames":["ibs3f0"],"linkTypes":["infiniband"],"IsRdma":true,"NeedVhostNet":false},"SelectorObj":null},{"resourceName":"port2","selectors":{"vendors":["15b3"],"pfNames":["ibs3f1"],"linkTypes":["infiniband"],"IsRdma":true,"NeedVhostNet":false},"SelectorObj":null}]}'
kind: ConfigMap

however it get resets every 300 seconds.

for reference:
NetworkAttachmentDefinition:

apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/port1 name: network-port-1 namespace: default spec: config: "{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"network-port-1\",\n\ \ \"type\": \"ib-sriov\",\n \"logLevel\": \"info\",\n \"ipam\": {\n \ \ \"type\": \"whereabouts\",\n \"range\": \"192.168.1.2/24\",\n \ \ \"exclude\": [\n \"192.168.1.1\",\n \"192.168.1.2\"\ ,\n \"192.168.1.254\",\n \"192.168.1.255\"\n ],\n\ \ \"routes\": [\n {\n \"dst\": \"192.168.1.0/24\"\ \n }\n ]\n }\n}"


SriovIBNetwork:

apiVersion: sriovnetwork.openshift.io/v1 kind: SriovIBNetwork metadata: name: sriov-ib-network-port-1 namespace: openshift-sriov-network-operator spec: pfNames:

zeeke commented 1 month ago

hi @bbenshab , please raise the issue in the upstream repository https://github.com/k8snetworkplumbingwg/sriov-network-operator/

so we can discuss it with the community. This repo is the openshift downstream fork. we use it for handling openshift releases code

bbenshab commented 4 weeks ago

sure - its now open at https://github.com/k8snetworkplumbingwg/sriov-network-operator/issues/795 also tagging @ashishkamra @zshi-redhat @egallen FYI