k8snetworkplumbingwg / sriov-network-operator

Operator for provisioning and configuring SR-IOV CNI plugin and device plugin
Apache License 2.0
85 stars 114 forks source link

Unable to Retrieve Default Driver for VF After Unbinding in SR-IOV Configuration Process #802

Open popsiclexu opened 3 weeks ago

popsiclexu commented 3 weeks ago

Hi,

I encountered an issue with configuring the virtual function (VF) driver in an SR-IOV setup. During the configuration process, both the physical function (PF) and VF drivers are unbound. However, after unbinding, the VF cannot retrieve its default driver, which prevents further configuration of the VF driver.

Environment Details:

OS: Ubuntu 22.04 Network Card: Mellanox Technologies MT28908 Family [ConnectX-6]

Could you please explain why the driver unbinding is necessary? Also, are there any recommended solutions or workarounds for this issue?

Thanks for your assistance! Error Log:

2024-10-30T09:33:12.725660712Z  INFO    daemon/daemon.go:485    mellanox plugin Apply()
2024-10-30T09:33:12.725667977Z  INFO    mellanox/mellanox_plugin.go:212 mellanox-plugin configFW()
2024-10-30T09:33:12.725673739Z  INFO    daemon/daemon.go:485    k8s plugin Apply()
2024-10-30T09:33:12.72568048Z   INFO    daemon/daemon.go:500    generic plugin Apply()  {"desiredState": {"interfaces":[{"pciAddress":"0000:12:00.0","numVfs":8,"name":"ibs2","linkType":"IB","vfGroups":[{"resourceName":"ibcx6vfnuma0","deviceType":"netdevice","vfRange":"0-7","policyName":"ibcx6vfnuma0","isRdma":true}]},{"pciAddress":"0000:33:00.0","numVfs":8,"name":"ibs3","linkType":"IB","vfGroups":[{"resourceName":"ibcx6vfnuma1","deviceType":"netdevice","vfRange":"0-7","policyName":"ibcx6vfnuma1","isRdma":true}]}],"bridges":{}}}
2024-10-30T09:33:12.725711244Z  LEVEL(-2)       sriovnetwork    sriov/sriov.go:773      NeedToUpdateSriov(): NumVfs needs update        {"desired": 8, "current": 0}
2024-10-30T09:33:12.725723274Z  LEVEL(-2)       sriov/sriov.go:606      configSriovInterfaces(): start sriov configuration
2024-10-30T09:33:12.7257294Z    LEVEL(-2)       sriov/sriov.go:737      configSriovDevice(): configure sriov device     {"device": "0000:33:00.0", "config": {"pciAddress":"0000:33:00.0","numVfs":8,"name":"ibs3","linkType":"IB","vfGroups":[{"resourceName":"ibcx6vfnuma1","deviceType":"netdevice","vfRange":"0-7","policyName":"ibcx6vfnuma1","isRdma":true}]}, "skipVFConfiguration": false}
2024-10-30T09:33:12.725739966Z  LEVEL(-2)       sriov/sriov.go:557      configSriovPFDevice(): configure PF sriov device        {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725781989Z  LEVEL(-2)       sriov/sriov.go:323      configureHWOptionsForSwitchdev(): configure HW options for device       {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725789052Z  LEVEL(-2)       sriov/sriov.go:329      removeUdevRules(): remove udev rules for device {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725794729Z  LEVEL(-2)       sriov/sriov.go:967      RemoveDisableNMUdevRule()       {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725801648Z  LEVEL(-2)       udev/udev.go:82 removeUdevRule()        {"device": "0000:33:00.0", "rule": "10-nm-disable"}
2024-10-30T09:33:12.725868876Z  LEVEL(-2)       sriov/sriov.go:970      RemoveVfRepresentorUdevRule()   {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725875222Z  LEVEL(-2)       udev/udev.go:109        removeUdevRule()        {"device": "0000:33:00.0", "rule": "20-switchdev"}
2024-10-30T09:33:12.725887067Z  LEVEL(-2)       sriov/sriov.go:973      RemovePersistPFNameUdevRule()   {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725893539Z  LEVEL(-2)       udev/udev.go:95 removeUdevRule()        {"device": "0000:33:00.0", "rule": "10-pf-name"}
2024-10-30T09:33:12.72590496Z   LEVEL(-2)       sriov/sriov.go:333      addUdevRules(): add udev rules for device       {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725911824Z  LEVEL(-2)       sriov/sriov.go:931      AddDisableNMUdevRule()  {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725923128Z  LEVEL(-2)       udev/udev.go:76 addUdevRule()   {"device": "0000:33:00.0", "rule": "10-nm-disable"}
2024-10-30T09:33:12.725964652Z  LEVEL(-2)       sriov/sriov.go:338      createVFs(): configure VFs for device   {"device": "0000:33:00.0", "count": 8, "mode": "legacy"}
2024-10-30T09:33:12.725987641Z  LEVEL(-2)       sriov/sriov.go:989      setEswitchModeAndNumVFs(): configure VFs for device     {"device": "0000:33:00.0", "count": 8, "mode": "legacy"}
2024-10-30T09:33:12.725995444Z  LEVEL(-2)       sriov/sriov.go:1012     GetNicSriovMode()       {"device": "0000:33:00.0"}
2024-10-30T09:33:12.726153137Z  LEVEL(-2)       sriov/sriov.go:1022     SetSriovNumVfs(): set NumVfs    {"device": "0000:33:00.0", "numVfs": 8}
2024-10-30T09:33:15.242141245Z  LEVEL(-2)       sriov/sriov.go:578      configSriovVFDevices(): configure PF sriov device       {"device": "0000:33:00.0"}
2024-10-30T09:33:15.245870083Z  LEVEL(-2)       kernel/kernel.go:240    getDriverByBusAndDevice(): driver for device    {"bus": "pci", "device": "0000:33:00.1", "driver": "../../../../../../../../bus/pci/drivers/mlx5_core"}
2024-10-30T09:33:15.245889695Z  LEVEL(-2)       sriov/sriov.go:441      HasDriver(): device driver for device   {"device": "0000:33:00.1", "driver": "mlx5_core"}
2024-10-30T09:33:15.246058046Z  LEVEL(-2)       kernel/kernel.go:240    getDriverByBusAndDevice(): driver for device    {"bus": "pci", "device": "0000:33:00.1", "driver": "../../../../../../../../bus/pci/drivers/mlx5_core"}
2024-10-30T09:33:15.246066796Z  LEVEL(-2)       sriov/sriov.go:471      HasDriver(): device driver for device   {"device": "0000:33:00.1", "driver": "mlx5_core"}
2024-10-30T09:33:15.246075161Z  INFO    sriov/sriov.go:479      ConfigureVfGUID(): configure vf guid    {"vfAddr": "0000:33:00.1", "pfAddr": "0000:33:00.0", "vfID": 0}
2024-10-30T09:33:15.246089217Z  INFO    sriov/sriov.go:479      ConfigureVfGUID(): set vf guid  {"address": "0000:33:00.1", "guid": "44:14:25:ac:32:de:6d:59"}
2024-10-30T09:33:15.31327028Z   LEVEL(-2)       sriov/sriov.go:482      Unbind(): unbind device driver for device       {"device": "0000:33:00.0"}
2024-10-30T09:33:15.313282083Z  LEVEL(-2)       kernel/kernel.go:116    UnbindDriverByBusAndDevice(): unbind device driver for device   {"bus": "pci", "device": "0000:33:00.0"}
2024-10-30T09:33:15.313305237Z  LEVEL(-2)       kernel/kernel.go:228    getDriverByBusAndDevice(): driver for device    {"bus": "pci", "device": "0000:33:00.0", "driver": "../../../../../../../../bus/pci/drivers/mlx5_core"}
2024-10-30T09:33:15.313315489Z  LEVEL(-2)       kernel/kernel.go:236    unbindDriver(): unbind from driver      {"bus": "pci", "device": "0000:33:00.0", "driver": "mlx5_core"}
2024-10-30T09:33:29.914343742Z  INFO    sriov/sriov.go:509      UnbindDriverIfNeeded(): unbinding driver        {"device": "0000:33:00.1"}
2024-10-30T09:33:29.91439288Z   LEVEL(-2)       kernel/kernel.go:215    Unbind(): unbind device driver for device       {"device": "0000:33:00.1"}
2024-10-30T09:33:29.914409467Z  LEVEL(-2)       kernel/kernel.go:116    UnbindDriverByBusAndDevice(): unbind device driver for device   {"bus": "pci", "device": "0000:33:00.1"}
2024-10-30T09:33:29.914445668Z  LEVEL(-2)       kernel/kernel.go:228    getDriverByBusAndDevice(): driver path for device not exist     {"bus": "pci", "device": "0000:33:00.1", "driver": ""}
2024-10-30T09:33:29.914470673Z  LEVEL(-2)       kernel/kernel.go:116    UnbindDriverByBusAndDevice(): device has no driver      {"bus": "pci", "device": "0000:33:00.1"}
2024-10-30T09:33:29.914478631Z  INFO    sriov/sriov.go:509      UnbindDriverIfNeeded(): unbounded driver        {"device": "0000:33:00.1"}
2024-10-30T09:33:29.914485249Z  LEVEL(-2)       sriov/sriov.go:523      BindDefaultDriver(): bind device to default driver      {"device": "0000:33:00.1"}
2024-10-30T09:33:29.914494293Z  LEVEL(-2)       kernel/kernel.go:141    getDriverByBusAndDevice(): driver path for device not exist     {"bus": "pci", "device": "0000:33:00.1", "driver": ""}
2024-10-30T09:33:29.914505595Z  LEVEL(-2)       kernel/kernel.go:155    setDriverOverride(): device doesn't support driver override, skip       {"bus": "pci", "device": "0000:33:00.1"}
2024-10-30T09:33:29.914512597Z  LEVEL(-2)       kernel/kernel.go:158    probeDriver(): drivers probe    {"bus": "pci", "device": "0000:33:00.1"}
2024-10-30T09:33:29.914669852Z  ERROR   kernel/kernel.go:158    probeDriver(): failed to trigger driver probe   {"bus": "pci", "device": "0000:33:00.1", "error": "write /sys/bus/pci/drivers_probe: no such device"}
2024-10-30T09:33:29.914680847Z  ERROR   sriov/sriov.go:578      configSriovVFDevices(): fail to bind default driver for device  {"device": "0000:33:00.1", "error": "write /sys/bus/pci/drivers_probe: no such device"}
2024-10-30T09:33:29.914687956Z  ERROR   sriov/sriov.go:606      configSriovInterfaces(): fail to configure sriov interface. resetting interface.        {"address": "0000:33:00.0", "error": "write /sys/bus/pci/drivers_probe: no such device"}
2024-10-30T09:33:29.914696162Z  LEVEL(-2)       sriov/sriov.go:742      ResetSriovDevice(): reset SRIOV device  {"address": "0000:33:00.0"}
2024-10-30T09:33:29.914702892Z  LEVEL(-2)       sriov/sriov.go:115      SetSriovNumVfs(): set NumVfs    {"device": "0000:33:00.0", "numVfs": 0}
2024-10-30T09:33:29.91474458Z   LEVEL(-2)       sriov/sriov.go:118      SetNetdevMTU(): set MTU {"device": "0000:33:00.0", "mtu": 2048}
2024-10-30T09:33:29.914771797Z  ERROR   network/network.go:183  TryGetInterfaceName(): failed to get interface name     {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:29.914780828Z  ERROR   backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name      {"device": "0000:33:00.0"}
2024-10-30T09:33:30.91582833Z   ERROR   network/network.go:183  TryGetInterfaceName(): failed to get interface name     {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:30.915866955Z  ERROR   backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name      {"device": "0000:33:00.0"}
2024-10-30T09:33:31.916007202Z  ERROR   network/network.go:183  TryGetInterfaceName(): failed to get interface name     {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:31.916041532Z  ERROR   backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name      {"device": "0000:33:00.0"}
2024-10-30T09:33:32.916217962Z  ERROR   network/network.go:183  TryGetInterfaceName(): failed to get interface name     {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:32.916245767Z  ERROR   backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name      {"device": "0000:33:00.0"}
2024-10-30T09:33:33.916340437Z  ERROR   network/network.go:183  TryGetInterfaceName(): failed to get interface name     {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:33.916387635Z  ERROR   backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name      {"device": "0000:33:00.0"}
rollandf commented 3 weeks ago

It may be related to : #797