Open perifaws opened 1 year ago
Can you please add some unit tests with examples of what the /sys
structure looks like? Otherwise this code will be impossible to maintain with confidence.
What's EFA specific about the collector? I can't see anywhere that it checks the PCI device ID or something like that for an Amazon VID/PID. Looks like it just looks in the normal infiniband directories?
eg if I have a random Mellanox IB device, will this collector ignore it?
This change adds a new sysfs class to read metrics from Amazon Elastic Fabric Adapter (EFA). This change is based on the Infiniband class.
EFA is supported on a variety of Amazon EC2 instances (list here) and is relevant for HPC & distributed training (ML) applications in the same fashion as Infiniband.
There's an associated collector for the
node_exporter
generated for validation. Happy to provide a sample output as requested. Thanks!Related to the Prometheus Google Groups thread: https://groups.google.com/g/prometheus-developers/c/MEal59mDebs/m/ZQBU1f0hCAAJ