prometheus / node_exporter

Exporter for machine metrics
Apache License 2.0
10.65k stars 2.31k forks source link

Infiniband metrics: still not collected when irdma is loaded (PE 1.7.0) #2846

Open mtds opened 7 months ago

mtds commented 7 months ago

Host operating system: output of uname -a

Linux (...) 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Wed Sep 20 15:55:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Host operating system: Rocky Linux 8.8

node_exporter version: output of node_exporter --version

~$ node_exporter --version
node_exporter, version 1.7.0 (branch: HEAD, revision: 7333465abf9efba81876303bb57e6fadb946041b)
  build user:       root@35918982f6d8
  build date:       20231112-23:53:35
  go version:       go1.21.4
  platform:         linux/amd64
  tags:             netgo osusergo static_build

node_exporter command line flags

--no-collector.arp --collector.netdev.device-include=ib0 \ /var/lib/prometheus/node-exporter/textfile_collector \

node_exporter log output

Are you running node_exporter in Docker?


What did you do that produced an error?

There's no error whatsoever: the exporter is just not able to collect IB metrics (see next section).

What did you expect to see?

When the irdma module is not loaded, Node Exporter correctly collects and reports IB metrics:

ts=2023-11-14T10:56:03.868Z caller=node_exporter.go:78 level=debug msg="collect query:" filters="unsupported value type"
ts=2023-11-14T10:56:03.874Z caller=collector.go:173 level=debug msg="collector succeeded" name=infiniband duration_seconds=0.006788827

What did you see instead?

Infiniband metrics are not collected when the irdma module is loaded:

ts=2023-11-13T08:50:33.312Z caller=node_exporter.go:78 level=debug msg="collect query:" filters="unsupported value type"                                                                                                                                      
ts=2023-11-13T08:50:33.312Z caller=infiniband_linux.go:119 level=debug collector=infiniband msg="infiniband statistics not found, skipping"                                                                                                                   
ts=2023-11-13T08:50:33.313Z caller=collector.go:167 level=debug msg="collector returned no data" name=infiniband duration_seconds=0.000573153 err="collector returned no data"


dswarbrick commented 7 months ago

For the collector to return no data, it means that the FS.InfiniBandClass function in procfs is returning os.ErrNotExist.

func (c *infinibandCollector) Update(ch chan<- prometheus.Metric) error {
    devices, err := c.fs.InfiniBandClass()
    if err != nil {
        if errors.Is(err, os.ErrNotExist) {
            level.Debug(c.logger).Log("msg", "infiniband statistics not found, skipping")
            return ErrNoData

There are multiple places in the InfiniBandClass procfs collector which could potentially return os.ErrNotExist.

Can you please paste a recursive directory listing of your /sys/class/infiniband? It seems that the collector may still be assuming the presence of certain files that are not present with the irdma module.

mtds commented 7 months ago
~$ ls -lR /sys/class/infiniband
total 0
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 irdma0 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.0/infiniband/irdma0
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 irdma1 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.3/infiniband/irdma1
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 irdma2 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.1/infiniband/irdma2
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 irdma3 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.2/infiniband/irdma3
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 mlx5_0 -> ../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/infiniband/mlx5_0

In comparison, when the irdma module is unloaded, there's only one symbolic link:

~$ ls -lR /sys/class/infiniband
total 0
lrwxrwxrwx. 1 root root 0 Nov  8 12:58 mlx5_0 -> ../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/infiniband/mlx5_0

Content of the directory related to the IB driver:

~$ ls -l /sys/class/infiniband/mlx5_0/
total 0
-r--r--r--. 1 root root 4096 Nov 14 11:55 board_id
lrwxrwxrwx. 1 root root    0 Nov  8 17:02 device -> ../../../0000:3b:00.0
-r--r--r--. 1 root root 4096 Nov 16 12:46 fw_pages
-r--r--r--. 1 root root 4096 Nov 14 11:55 fw_ver
-r--r--r--. 1 root root 4096 Nov 14 11:55 hca_type
-r--r--r--. 1 root root 4096 Nov 16 12:46 hw_rev
-rw-r--r--. 1 root root 4096 Nov 16 12:46 node_desc
-r--r--r--. 1 root root 4096 Nov 13 11:02 node_guid
-r--r--r--. 1 root root 4096 Nov  8 17:02 node_type
drwxr-xr-x. 3 root root    0 Nov  8 12:58 ports
drwxr-xr-x. 2 root root    0 Nov 16 12:46 power
-r--r--r--. 1 root root 4096 Nov 16 12:46 reg_pages
lrwxrwxrwx. 1 root root    0 Nov 10 08:25 subsystem -> ../../../../../../class/infiniband
-r--r--r--. 1 root root 4096 Nov 13 11:02 sys_image_guid
-rw-r--r--. 1 root root 4096 Nov 10 08:25 uevent

The irdmaX sub-directories shows less files:

~$ ls -la /sys/class/infiniband/irdma0/
total 0
drwxr-xr-x. 4 root root    0 Nov  8 15:06 .
drwxr-xr-x. 3 root root    0 Nov  8 15:06 ..
lrwxrwxrwx. 1 root root    0 Nov  8 17:02 device -> ../../../0000:1a:00.0
-r--r--r--. 1 root root 4096 Nov 16 12:46 fw_ver
-rw-r--r--. 1 root root 4096 Nov 16 12:46 node_desc
-r--r--r--. 1 root root 4096 Nov  8 17:02 node_guid
-r--r--r--. 1 root root 4096 Nov  8 17:02 node_type
drwxr-xr-x. 3 root root    0 Nov  8 15:18 ports
drwxr-xr-x. 2 root root    0 Nov 16 12:46 power
lrwxrwxrwx. 1 root root    0 Nov 13 19:51 subsystem -> ../../../../../../../../class/infiniband
-r--r--r--. 1 root root 4096 Nov  8 17:02 sys_image_guid
-rw-r--r--. 1 root root 4096 Nov 13 19:51 uevent
dswarbrick commented 7 months ago

board_id and hca_type are absent for irdmaX devices, but that's fine because the procfs package tolerates that and continues (cf.

Can you also dig a bit deeper into the ports directory? The collector looks for state, phys_state and rate files in the enumerated port subdirectories. Can you also list the contents of the counters directory of one of those port subdirectories?

There is one other bit of code in the procfs collector that might be bailing out:

    // Parse legacy counters
    path = filepath.Join(portPath, "counters_ext")
    files, err = os.ReadDir(path)
    if err != nil && !os.IsNotExist(err) {
        return nil, err

There is a good chance that the irdma module does not implement these legacy counters, since it was a ground-up rewrite relatively recently. From a quick peek at the IB module source in kernel 6.6, it seems that only the qib, mlx4, mlx5 and hfi1 drivers expose counters_ext.

mtds commented 7 months ago

Here are the listing of the ports directories:

dswarbrick commented 7 months ago

Aha, I also misread the code I quoted in my previous comment, since it would tolerate os.ErrNotExist for the counters_ext directory.

However, this code will bail out on the irdma devices, since they do not expose a counters directory - only hw_counters (which is currently only parsed for mlx5 devices):

func parseInfiniBandCounters(portPath string) (*InfiniBandCounters, error) {
    var counters InfiniBandCounters

    path := filepath.Join(portPath, "counters")
    files, err := os.ReadDir(path)
    if err != nil {
        return nil, err
mtds commented 7 months ago

I would have assumed that Node Exporter will go through all the paths under /sys/class/infiniband/<Name>, despite the fact that counters is not present for irdmaX cards (not configured in our case).

Why the exporter is giving up (seemingly) after its first try?

dswarbrick commented 7 months ago

@mtds The behaviour is due to fairly generic error handling in the procfs code, whereby it bails out upon pretty much any error.

I suspect that the code was originally written by somebody who only had access to Mellanox HCAs, since they are (in my experience) by far the most common IB hardware in use for about the last 10 years. The Intel irdma driver has opted to only implement hw_counters, rather than the older counters described in

This should be a fairly easy fix, but unfortunately will require another release cycle of both procfs and node_exporter.

mtds commented 7 months ago

@dswarbrick Thanks, it's clear now. For the time being, I guess we can easily implement the workaround on our side (unload the irdma module and put it into a blacklist).

Should I open a bug report on the procfs repository as well? The problem is indeed on that component and not on the node exporter itself. Or it would generate too much 'noise'?

dswarbrick commented 7 months ago

@mtds I would recommend opening an issue on the procfs repository and reference this one, also keeping it open as a placeholder until a new node_exporter is released with a fix.

mtds commented 7 months ago

For reference: procfs#589 issue.

blixuga commented 3 months ago

Just pulled and built master, even with the procfs issue resolved, node_exporter still does not work if irdma is loaded.

dswarbrick commented 3 months ago

@blixuga Can you please provide debug logs so that we can try to resolve this? The more info, the better.