guilbaults / infiniband-exporter

Prometheus exporter for a Infiniband Fabric
Apache License 2.0
55 stars 22 forks source link

Gather HCA statistics #25

Closed guilbaults closed 3 years ago

guilbaults commented 3 years ago

At the moment, only the statistics of the switches are collected. The original idea was to limit interaction with the compute nodes since every packets go to a switch, so the switch counters see all the traffic. This implementation does not capture errors that are localized on the HCA of a compute node, only the errors on the switches are seen.

ibqueryerrors is currently called with --switch, it should run without that flag, or run twice with the --ca flag on the second execution.

gabrieleiannetti commented 3 years ago

Hi,

I would prefer to just run ibqueryerrors one time with --switch and --ca flag enabled.

Looks like a more clear approach to me.

For instance

guilbaults commented 3 years ago

Thanks, merged