prometheus-community / node-exporter-textfile-collector-scripts

Scripts for node-exporter's textfile collector
Apache License 2.0
476 stars 183 forks source link

mellanox_hca_temp: suppress errors when virtual functions present #199

Open ik5pvx opened 7 months ago

ik5pvx commented 7 months ago

VFs appear as additional infiniband devices, but obviously don't report temperatures. The logs are then flooded with:

Dec 17 00:00:53 penny sh[1151952]: mopen: Operation not supported
Dec 17 00:00:53 penny sh[1151947]: mellanox_hca_temp: Failed to get temperature from InfiniBand HCA 'mlx5_2'!
Dec 17 00:00:53 penny sh[1151953]: mopen: Operation not supported
Dec 17 00:00:53 penny sh[1151947]: mellanox_hca_temp: Failed to get temperature from InfiniBand HCA 'mlx5_3'!

and so on.

The only clue I could find to recognise them as virtual is that node_guid is 0000:0000:0000:0000. I'm not sure if this is supposed to change when setting the mac address on the interfaces.

So far, with the virtual function interfaces unconfigured, the following patch suppresses the errors for me:

--- mellanox_hca_temp.orig      2021-06-27 08:55:33.406292246 +0200
+++ mellanox_hca_temp   2023-12-22 15:18:46.072149247 +0100
@@ -41,6 +41,10 @@
     if test ! -d "$dev"; then
         continue
     fi
+    # node_guid is all zeros for Virtual Functions, which report no temp.
+    if [ "$(cat $dev/node_guid)" = "0000:0000:0000:0000" ]; then
+       continue
+    fi
     device="${dev##*/}"

     # get temperature
dswarbrick commented 7 months ago

I don't think that depending on an all-zeros node_guid is a reliable method, since the VF node_guid can be set by the user, e.g. echo 00:11:22:33:44:55:1:0 > /sys/class/infiniband/mlx5_0/device/sriov/0/node.

How about using the existence of the sriov directory as an indicator of whether the device is a VF or a PF?

(cf. https://docs.nvidia.com/networking/display/ofedv521040/single+root+io+virtualization+(sr-iov))

ik5pvx commented 7 months ago

I don't think that depending on an all-zeros node_guid is a reliable method, since the VF node_guid can be set by the user, e.g. echo 00:11:22:33:44:55:1:0 > /sys/class/infiniband/mlx5_0/device/sriov/0/node.

You are indeed right, this is not a reliable method. And it also seem to depend on which driver one is using.

How about using the existence of the sriov directory as an indicator of whether the device is a VF or a PF?

I'm with just mlx5 from within the kernel (didn't install and compile OFED), and there's no sriov directory. There are some sriov_* files there, though:

root@penny:/sys/class/infiniband/mlx5_0/device # ls sriov*
sriov_drivers_autoprobe  sriov_numvfs  sriov_offset  sriov_stride  sriov_totalvfs  sriov_vf_device  sriov_vf_total_msix
root@penny:/sys/class/infiniband/mlx5_0/device # cat sriov_*
1
4
2
1
32
1018
0

The VF driver directory instead only has one file, writable:

root@penny:/sys/class/infiniband/mlx5_2/device # ls -als sriov_vf_msix_count 0 --w------- 1 root root 4096 Dec 24 11:30 sriov_vf_msix_count

So, I guess we could use driver/sriov_numvfs as a check, but I'm not sure this is valid across different driver combinations, and also I'm not sure what happens if one boots with sriov disabled in bios. I can't test this now.