Open ik5pvx opened 11 months ago
I don't think that depending on an all-zeros node_guid is a reliable method, since the VF node_guid can be set by the user, e.g. echo 00:11:22:33:44:55:1:0 > /sys/class/infiniband/mlx5_0/device/sriov/0/node
.
How about using the existence of the sriov
directory as an indicator of whether the device is a VF or a PF?
(cf. https://docs.nvidia.com/networking/display/ofedv521040/single+root+io+virtualization+(sr-iov))
I don't think that depending on an all-zeros node_guid is a reliable method, since the VF node_guid can be set by the user, e.g.
echo 00:11:22:33:44:55:1:0 > /sys/class/infiniband/mlx5_0/device/sriov/0/node
.
You are indeed right, this is not a reliable method. And it also seem to depend on which driver one is using.
How about using the existence of the
sriov
directory as an indicator of whether the device is a VF or a PF?
I'm with just mlx5 from within the kernel (didn't install and compile OFED), and there's no sriov directory. There are some sriov_* files there, though:
root@penny:/sys/class/infiniband/mlx5_0/device # ls sriov*
sriov_drivers_autoprobe sriov_numvfs sriov_offset sriov_stride sriov_totalvfs sriov_vf_device sriov_vf_total_msix
root@penny:/sys/class/infiniband/mlx5_0/device # cat sriov_*
1
4
2
1
32
1018
0
The VF driver directory instead only has one file, writable:
root@penny:/sys/class/infiniband/mlx5_2/device # ls -als sriov_vf_msix_count 0 --w------- 1 root root 4096 Dec 24 11:30 sriov_vf_msix_count
So, I guess we could use driver/sriov_numvfs as a check, but I'm not sure this is valid across different driver combinations, and also I'm not sure what happens if one boots with sriov disabled in bios. I can't test this now.
VFs appear as additional infiniband devices, but obviously don't report temperatures. The logs are then flooded with:
and so on.
The only clue I could find to recognise them as virtual is that node_guid is 0000:0000:0000:0000. I'm not sure if this is supposed to change when setting the mac address on the interfaces.
So far, with the virtual function interfaces unconfigured, the following patch suppresses the errors for me: