I encountered an issue with configuring the virtual function (VF) driver in an SR-IOV setup. During the configuration process, both the physical function (PF) and VF drivers are unbound. However, after unbinding, the VF cannot retrieve its default driver, which prevents further configuration of the VF driver.
Environment Details:
OS: Ubuntu 22.04
Network Card: Mellanox Technologies MT28908 Family [ConnectX-6]
Could you please explain why the driver unbinding is necessary? Also, are there any recommended solutions or workarounds for this issue?
Thanks for your assistance!
Error Log:
2024-10-30T09:33:12.725660712Z INFO daemon/daemon.go:485 mellanox plugin Apply()
2024-10-30T09:33:12.725667977Z INFO mellanox/mellanox_plugin.go:212 mellanox-plugin configFW()
2024-10-30T09:33:12.725673739Z INFO daemon/daemon.go:485 k8s plugin Apply()
2024-10-30T09:33:12.72568048Z INFO daemon/daemon.go:500 generic plugin Apply() {"desiredState": {"interfaces":[{"pciAddress":"0000:12:00.0","numVfs":8,"name":"ibs2","linkType":"IB","vfGroups":[{"resourceName":"ibcx6vfnuma0","deviceType":"netdevice","vfRange":"0-7","policyName":"ibcx6vfnuma0","isRdma":true}]},{"pciAddress":"0000:33:00.0","numVfs":8,"name":"ibs3","linkType":"IB","vfGroups":[{"resourceName":"ibcx6vfnuma1","deviceType":"netdevice","vfRange":"0-7","policyName":"ibcx6vfnuma1","isRdma":true}]}],"bridges":{}}}
2024-10-30T09:33:12.725711244Z LEVEL(-2) sriovnetwork sriov/sriov.go:773 NeedToUpdateSriov(): NumVfs needs update {"desired": 8, "current": 0}
2024-10-30T09:33:12.725723274Z LEVEL(-2) sriov/sriov.go:606 configSriovInterfaces(): start sriov configuration
2024-10-30T09:33:12.7257294Z LEVEL(-2) sriov/sriov.go:737 configSriovDevice(): configure sriov device {"device": "0000:33:00.0", "config": {"pciAddress":"0000:33:00.0","numVfs":8,"name":"ibs3","linkType":"IB","vfGroups":[{"resourceName":"ibcx6vfnuma1","deviceType":"netdevice","vfRange":"0-7","policyName":"ibcx6vfnuma1","isRdma":true}]}, "skipVFConfiguration": false}
2024-10-30T09:33:12.725739966Z LEVEL(-2) sriov/sriov.go:557 configSriovPFDevice(): configure PF sriov device {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725781989Z LEVEL(-2) sriov/sriov.go:323 configureHWOptionsForSwitchdev(): configure HW options for device {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725789052Z LEVEL(-2) sriov/sriov.go:329 removeUdevRules(): remove udev rules for device {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725794729Z LEVEL(-2) sriov/sriov.go:967 RemoveDisableNMUdevRule() {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725801648Z LEVEL(-2) udev/udev.go:82 removeUdevRule() {"device": "0000:33:00.0", "rule": "10-nm-disable"}
2024-10-30T09:33:12.725868876Z LEVEL(-2) sriov/sriov.go:970 RemoveVfRepresentorUdevRule() {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725875222Z LEVEL(-2) udev/udev.go:109 removeUdevRule() {"device": "0000:33:00.0", "rule": "20-switchdev"}
2024-10-30T09:33:12.725887067Z LEVEL(-2) sriov/sriov.go:973 RemovePersistPFNameUdevRule() {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725893539Z LEVEL(-2) udev/udev.go:95 removeUdevRule() {"device": "0000:33:00.0", "rule": "10-pf-name"}
2024-10-30T09:33:12.72590496Z LEVEL(-2) sriov/sriov.go:333 addUdevRules(): add udev rules for device {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725911824Z LEVEL(-2) sriov/sriov.go:931 AddDisableNMUdevRule() {"device": "0000:33:00.0"}
2024-10-30T09:33:12.725923128Z LEVEL(-2) udev/udev.go:76 addUdevRule() {"device": "0000:33:00.0", "rule": "10-nm-disable"}
2024-10-30T09:33:12.725964652Z LEVEL(-2) sriov/sriov.go:338 createVFs(): configure VFs for device {"device": "0000:33:00.0", "count": 8, "mode": "legacy"}
2024-10-30T09:33:12.725987641Z LEVEL(-2) sriov/sriov.go:989 setEswitchModeAndNumVFs(): configure VFs for device {"device": "0000:33:00.0", "count": 8, "mode": "legacy"}
2024-10-30T09:33:12.725995444Z LEVEL(-2) sriov/sriov.go:1012 GetNicSriovMode() {"device": "0000:33:00.0"}
2024-10-30T09:33:12.726153137Z LEVEL(-2) sriov/sriov.go:1022 SetSriovNumVfs(): set NumVfs {"device": "0000:33:00.0", "numVfs": 8}
2024-10-30T09:33:15.242141245Z LEVEL(-2) sriov/sriov.go:578 configSriovVFDevices(): configure PF sriov device {"device": "0000:33:00.0"}
2024-10-30T09:33:15.245870083Z LEVEL(-2) kernel/kernel.go:240 getDriverByBusAndDevice(): driver for device {"bus": "pci", "device": "0000:33:00.1", "driver": "../../../../../../../../bus/pci/drivers/mlx5_core"}
2024-10-30T09:33:15.245889695Z LEVEL(-2) sriov/sriov.go:441 HasDriver(): device driver for device {"device": "0000:33:00.1", "driver": "mlx5_core"}
2024-10-30T09:33:15.246058046Z LEVEL(-2) kernel/kernel.go:240 getDriverByBusAndDevice(): driver for device {"bus": "pci", "device": "0000:33:00.1", "driver": "../../../../../../../../bus/pci/drivers/mlx5_core"}
2024-10-30T09:33:15.246066796Z LEVEL(-2) sriov/sriov.go:471 HasDriver(): device driver for device {"device": "0000:33:00.1", "driver": "mlx5_core"}
2024-10-30T09:33:15.246075161Z INFO sriov/sriov.go:479 ConfigureVfGUID(): configure vf guid {"vfAddr": "0000:33:00.1", "pfAddr": "0000:33:00.0", "vfID": 0}
2024-10-30T09:33:15.246089217Z INFO sriov/sriov.go:479 ConfigureVfGUID(): set vf guid {"address": "0000:33:00.1", "guid": "44:14:25:ac:32:de:6d:59"}
2024-10-30T09:33:15.31327028Z LEVEL(-2) sriov/sriov.go:482 Unbind(): unbind device driver for device {"device": "0000:33:00.0"}
2024-10-30T09:33:15.313282083Z LEVEL(-2) kernel/kernel.go:116 UnbindDriverByBusAndDevice(): unbind device driver for device {"bus": "pci", "device": "0000:33:00.0"}
2024-10-30T09:33:15.313305237Z LEVEL(-2) kernel/kernel.go:228 getDriverByBusAndDevice(): driver for device {"bus": "pci", "device": "0000:33:00.0", "driver": "../../../../../../../../bus/pci/drivers/mlx5_core"}
2024-10-30T09:33:15.313315489Z LEVEL(-2) kernel/kernel.go:236 unbindDriver(): unbind from driver {"bus": "pci", "device": "0000:33:00.0", "driver": "mlx5_core"}
2024-10-30T09:33:29.914343742Z INFO sriov/sriov.go:509 UnbindDriverIfNeeded(): unbinding driver {"device": "0000:33:00.1"}
2024-10-30T09:33:29.91439288Z LEVEL(-2) kernel/kernel.go:215 Unbind(): unbind device driver for device {"device": "0000:33:00.1"}
2024-10-30T09:33:29.914409467Z LEVEL(-2) kernel/kernel.go:116 UnbindDriverByBusAndDevice(): unbind device driver for device {"bus": "pci", "device": "0000:33:00.1"}
2024-10-30T09:33:29.914445668Z LEVEL(-2) kernel/kernel.go:228 getDriverByBusAndDevice(): driver path for device not exist {"bus": "pci", "device": "0000:33:00.1", "driver": ""}
2024-10-30T09:33:29.914470673Z LEVEL(-2) kernel/kernel.go:116 UnbindDriverByBusAndDevice(): device has no driver {"bus": "pci", "device": "0000:33:00.1"}
2024-10-30T09:33:29.914478631Z INFO sriov/sriov.go:509 UnbindDriverIfNeeded(): unbounded driver {"device": "0000:33:00.1"}
2024-10-30T09:33:29.914485249Z LEVEL(-2) sriov/sriov.go:523 BindDefaultDriver(): bind device to default driver {"device": "0000:33:00.1"}
2024-10-30T09:33:29.914494293Z LEVEL(-2) kernel/kernel.go:141 getDriverByBusAndDevice(): driver path for device not exist {"bus": "pci", "device": "0000:33:00.1", "driver": ""}
2024-10-30T09:33:29.914505595Z LEVEL(-2) kernel/kernel.go:155 setDriverOverride(): device doesn't support driver override, skip {"bus": "pci", "device": "0000:33:00.1"}
2024-10-30T09:33:29.914512597Z LEVEL(-2) kernel/kernel.go:158 probeDriver(): drivers probe {"bus": "pci", "device": "0000:33:00.1"}
2024-10-30T09:33:29.914669852Z ERROR kernel/kernel.go:158 probeDriver(): failed to trigger driver probe {"bus": "pci", "device": "0000:33:00.1", "error": "write /sys/bus/pci/drivers_probe: no such device"}
2024-10-30T09:33:29.914680847Z ERROR sriov/sriov.go:578 configSriovVFDevices(): fail to bind default driver for device {"device": "0000:33:00.1", "error": "write /sys/bus/pci/drivers_probe: no such device"}
2024-10-30T09:33:29.914687956Z ERROR sriov/sriov.go:606 configSriovInterfaces(): fail to configure sriov interface. resetting interface. {"address": "0000:33:00.0", "error": "write /sys/bus/pci/drivers_probe: no such device"}
2024-10-30T09:33:29.914696162Z LEVEL(-2) sriov/sriov.go:742 ResetSriovDevice(): reset SRIOV device {"address": "0000:33:00.0"}
2024-10-30T09:33:29.914702892Z LEVEL(-2) sriov/sriov.go:115 SetSriovNumVfs(): set NumVfs {"device": "0000:33:00.0", "numVfs": 0}
2024-10-30T09:33:29.91474458Z LEVEL(-2) sriov/sriov.go:118 SetNetdevMTU(): set MTU {"device": "0000:33:00.0", "mtu": 2048}
2024-10-30T09:33:29.914771797Z ERROR network/network.go:183 TryGetInterfaceName(): failed to get interface name {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:29.914780828Z ERROR backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name {"device": "0000:33:00.0"}
2024-10-30T09:33:30.91582833Z ERROR network/network.go:183 TryGetInterfaceName(): failed to get interface name {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:30.915866955Z ERROR backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name {"device": "0000:33:00.0"}
2024-10-30T09:33:31.916007202Z ERROR network/network.go:183 TryGetInterfaceName(): failed to get interface name {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:31.916041532Z ERROR backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name {"device": "0000:33:00.0"}
2024-10-30T09:33:32.916217962Z ERROR network/network.go:183 TryGetInterfaceName(): failed to get interface name {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:32.916245767Z ERROR backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name {"device": "0000:33:00.0"}
2024-10-30T09:33:33.916340437Z ERROR network/network.go:183 TryGetInterfaceName(): failed to get interface name {"error": "GetNetName(): no net directory under pci device 0000:33:00.0: \"lstat /sys/bus/pci/devices/0000:33:00.0/net: no such file or directory\""}
2024-10-30T09:33:33.916387635Z ERROR backoff@v2.2.1+incompatible/retry.go:37 SetNetdevMTU(): fail to get interface name {"device": "0000:33:00.0"}
Hi,
I encountered an issue with configuring the virtual function (VF) driver in an SR-IOV setup. During the configuration process, both the physical function (PF) and VF drivers are unbound. However, after unbinding, the VF cannot retrieve its default driver, which prevents further configuration of the VF driver.
Environment Details:
OS: Ubuntu 22.04 Network Card: Mellanox Technologies MT28908 Family [ConnectX-6]
Could you please explain why the driver unbinding is necessary? Also, are there any recommended solutions or workarounds for this issue?
Thanks for your assistance! Error Log: