Open mtds opened 7 months ago
For the collector to return no data, it means that the FS.InfiniBandClass function in procfs is returning os.ErrNotExist.
func (c *infinibandCollector) Update(ch chan<- prometheus.Metric) error {
devices, err := c.fs.InfiniBandClass()
if err != nil {
if errors.Is(err, os.ErrNotExist) {
level.Debug(c.logger).Log("msg", "infiniband statistics not found, skipping")
return ErrNoData
}
...
There are multiple places in the InfiniBandClass procfs collector which could potentially return os.ErrNotExist.
Can you please paste a recursive directory listing of your /sys/class/infiniband
? It seems that the collector may still be assuming the presence of certain files that are not present with the irdma
module.
~$ ls -lR /sys/class/infiniband
/sys/class/infiniband:
total 0
lrwxrwxrwx. 1 root root 0 Nov 1 15:37 irdma0 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.0/infiniband/irdma0
lrwxrwxrwx. 1 root root 0 Nov 1 15:37 irdma1 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.3/infiniband/irdma1
lrwxrwxrwx. 1 root root 0 Nov 1 15:37 irdma2 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.1/infiniband/irdma2
lrwxrwxrwx. 1 root root 0 Nov 1 15:37 irdma3 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.2/infiniband/irdma3
lrwxrwxrwx. 1 root root 0 Nov 1 15:37 mlx5_0 -> ../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/infiniband/mlx5_0
In comparison, when the irdma
module is unloaded, there's only one symbolic link:
~$ ls -lR /sys/class/infiniband
/sys/class/infiniband:
total 0
lrwxrwxrwx. 1 root root 0 Nov 8 12:58 mlx5_0 -> ../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/infiniband/mlx5_0
Content of the directory related to the IB driver:
~$ ls -l /sys/class/infiniband/mlx5_0/
total 0
-r--r--r--. 1 root root 4096 Nov 14 11:55 board_id
lrwxrwxrwx. 1 root root 0 Nov 8 17:02 device -> ../../../0000:3b:00.0
-r--r--r--. 1 root root 4096 Nov 16 12:46 fw_pages
-r--r--r--. 1 root root 4096 Nov 14 11:55 fw_ver
-r--r--r--. 1 root root 4096 Nov 14 11:55 hca_type
-r--r--r--. 1 root root 4096 Nov 16 12:46 hw_rev
-rw-r--r--. 1 root root 4096 Nov 16 12:46 node_desc
-r--r--r--. 1 root root 4096 Nov 13 11:02 node_guid
-r--r--r--. 1 root root 4096 Nov 8 17:02 node_type
drwxr-xr-x. 3 root root 0 Nov 8 12:58 ports
drwxr-xr-x. 2 root root 0 Nov 16 12:46 power
-r--r--r--. 1 root root 4096 Nov 16 12:46 reg_pages
lrwxrwxrwx. 1 root root 0 Nov 10 08:25 subsystem -> ../../../../../../class/infiniband
-r--r--r--. 1 root root 4096 Nov 13 11:02 sys_image_guid
-rw-r--r--. 1 root root 4096 Nov 10 08:25 uevent
The irdmaX
sub-directories shows less files:
~$ ls -la /sys/class/infiniband/irdma0/
total 0
drwxr-xr-x. 4 root root 0 Nov 8 15:06 .
drwxr-xr-x. 3 root root 0 Nov 8 15:06 ..
lrwxrwxrwx. 1 root root 0 Nov 8 17:02 device -> ../../../0000:1a:00.0
-r--r--r--. 1 root root 4096 Nov 16 12:46 fw_ver
-rw-r--r--. 1 root root 4096 Nov 16 12:46 node_desc
-r--r--r--. 1 root root 4096 Nov 8 17:02 node_guid
-r--r--r--. 1 root root 4096 Nov 8 17:02 node_type
drwxr-xr-x. 3 root root 0 Nov 8 15:18 ports
drwxr-xr-x. 2 root root 0 Nov 16 12:46 power
lrwxrwxrwx. 1 root root 0 Nov 13 19:51 subsystem -> ../../../../../../../../class/infiniband
-r--r--r--. 1 root root 4096 Nov 8 17:02 sys_image_guid
-rw-r--r--. 1 root root 4096 Nov 13 19:51 uevent
board_id
and hca_type
are absent for irdmaX
devices, but that's fine because the procfs package tolerates that and continues (cf. https://github.com/prometheus/procfs/pull/556).
Can you also dig a bit deeper into the ports
directory? The collector looks for state
, phys_state
and rate
files in the enumerated port subdirectories. Can you also list the contents of the counters
directory of one of those port subdirectories?
There is one other bit of code in the procfs collector that might be bailing out:
// Parse legacy counters
path = filepath.Join(portPath, "counters_ext")
files, err = os.ReadDir(path)
if err != nil && !os.IsNotExist(err) {
return nil, err
}
There is a good chance that the irdma
module does not implement these legacy counters, since it was a ground-up rewrite relatively recently. From a quick peek at the IB module source in kernel 6.6, it seems that only the qib, mlx4, mlx5 and hfi1 drivers expose counters_ext
.
Here are the listing of the ports
directories:
mlx5_0
:
cd /sys/class/infiniband
# ls -la mlx5_0/ports/1/
total 0
drwxr-xr-x. 11 root root 0 Nov 8 15:18 .
drwxr-xr-x. 3 root root 0 Nov 8 15:18 ..
-r--r--r--. 1 root root 4096 Nov 16 17:43 cap_mask
drwxr-xr-x. 2 root root 0 Nov 16 17:43 cm_rx_duplicates
drwxr-xr-x. 2 root root 0 Nov 16 17:43 cm_rx_msgs
drwxr-xr-x. 2 root root 0 Nov 16 17:43 cm_tx_msgs
drwxr-xr-x. 2 root root 0 Nov 16 17:43 cm_tx_retries
drwxr-xr-x. 2 root root 0 Nov 16 17:43 counters
drwxr-xr-x. 4 root root 0 Nov 8 17:02 gid_attrs
drwxr-xr-x. 2 root root 0 Nov 8 17:02 gids
drwxr-xr-x. 2 root root 0 Nov 16 17:43 hw_counters
-r--r--r--. 1 root root 4096 Nov 8 17:02 lid
-r--r--r--. 1 root root 4096 Nov 8 17:02 lid_mask_count
-r--r--r--. 1 root root 4096 Nov 16 17:43 link_layer
-r--r--r--. 1 root root 4096 Nov 8 15:18 phys_state
drwxr-xr-x. 2 root root 0 Nov 8 17:02 pkeys
-r--r--r--. 1 root root 4096 Nov 8 15:18 rate
-r--r--r--. 1 root root 4096 Nov 16 17:43 sm_lid
-r--r--r--. 1 root root 4096 Nov 16 17:43 sm_sl
-r--r--r--. 1 root root 4096 Nov 8 15:18 state
#] cd mlx5_0/ports/1/
#] cat state phys_state rate
4: ACTIVE
5: LinkUp
100 Gb/sec (2X HDR)
irdm0
(the other irdmaX
expose the same structure)
#] ls -la irdma0/ports/1/
total 0
drwxr-xr-x. 5 root root 0 Nov 8 15:18 .
drwxr-xr-x. 3 root root 0 Nov 8 15:18 ..
-r--r--r--. 1 root root 4096 Nov 16 17:44 cap_mask
drwxr-xr-x. 4 root root 0 Nov 16 17:44 gid_attrs
drwxr-xr-x. 2 root root 0 Nov 8 17:02 gids
drwxr-xr-x. 2 root root 0 Nov 16 17:44 hw_counters
-r--r--r--. 1 root root 4096 Nov 8 17:02 lid
-r--r--r--. 1 root root 4096 Nov 8 17:02 lid_mask_count
-r--r--r--. 1 root root 4096 Nov 16 17:44 link_layer
-r--r--r--. 1 root root 4096 Nov 8 15:18 phys_state
-r--r--r--. 1 root root 4096 Nov 8 15:18 rate
-r--r--r--. 1 root root 4096 Nov 16 17:44 sm_lid
-r--r--r--. 1 root root 4096 Nov 16 17:44 sm_sl
-r--r--r--. 1 root root 4096 Nov 8 15:18 state
#] cd irdma0/ports/1/
#] cat state phys_state rate
1: DOWN
3: Disabled
100 Gb/sec (4X EDR)
Aha, I also misread the code I quoted in my previous comment, since it would tolerate os.ErrNotExist for the counters_ext
directory.
However, this code will bail out on the irdma
devices, since they do not expose a counters
directory - only hw_counters
(which is currently only parsed for mlx5 devices):
func parseInfiniBandCounters(portPath string) (*InfiniBandCounters, error) {
var counters InfiniBandCounters
path := filepath.Join(portPath, "counters")
files, err := os.ReadDir(path)
if err != nil {
return nil, err
}
...
I would have assumed that Node Exporter will go through all the paths under /sys/class/infiniband/<Name>
, despite the fact that counters
is not present for irdmaX
cards (not configured in our case).
Why the exporter is giving up (seemingly) after its first try?
@mtds The behaviour is due to fairly generic error handling in the procfs code, whereby it bails out upon pretty much any error.
I suspect that the code was originally written by somebody who only had access to Mellanox HCAs, since they are (in my experience) by far the most common IB hardware in use for about the last 10 years. The Intel irdma driver has opted to only implement hw_counters
, rather than the older counters
described in https://www.kernel.org/doc/Documentation/ABI/stable/sysfs-class-infiniband.
This should be a fairly easy fix, but unfortunately will require another release cycle of both procfs and node_exporter.
@dswarbrick Thanks, it's clear now. For the time being, I guess we can easily implement the workaround on our side (unload the irdma
module and put it into a blacklist).
Should I open a bug report on the procfs
repository as well? The problem is indeed on that component and not on the node exporter itself. Or it would generate too much 'noise'?
@mtds I would recommend opening an issue on the procfs repository and reference this one, also keeping it open as a placeholder until a new node_exporter is released with a fix.
For reference: procfs#589 issue.
Just pulled and built master, even with the procfs issue resolved, node_exporter still does not work if irdma is loaded.
@blixuga Can you please provide debug logs so that we can try to resolve this? The more info, the better.
Host operating system: output of
uname -a
Host operating system: Rocky Linux 8.8
node_exporter version: output of
node_exporter --version
node_exporter command line flags
node_exporter log output
Are you running node_exporter in Docker?
No.
What did you do that produced an error?
There's no error whatsoever: the exporter is just not able to collect IB metrics (see next section).
What did you expect to see?
When the
irdma
module is not loaded, Node Exporter correctly collects and reports IB metrics:What did you see instead?
Infiniband metrics are not collected when the
irdma
module is loaded:Workaround
Explicitly unload the
irdma
module:References
2769
https://github.com/prometheus/procfs/pull/556