tud-zih-energy / lo2s

Linux OTF2 Sampling - A Lightweight Node-Level Performance Monitoring Tool
https://tu-dresden.de/zih/forschung/projekte/lo2s?set_language=en
GNU General Public License v3.0
45 stars 13 forks source link

NEC Support Part 1: Sensors #251

Closed cvonelm closed 1 year ago

cvonelm commented 1 year ago

The NEC Aurora Tsubasa SX exports a set of sensors to the outside world via sysfs.

The base path for all sensors is /sys/class/ve/ve[X]/sensor_[Y], where X is the ID of the Accelerator card and Y is the ID of the sensor.

There are in total 38 sensors but I could only find solid evidence on what they measure for 4 of them:

I could not find a decent explanation for what voltage/current edge is in this context.

The only information I could find for how these counters relate to each other is this delightfully weird formula for calculating power:

     auto watts = powerVoltage() * powerCurrent();
     watts += powerVoltageEdge() * powerCurrentEdge();
     watts += 5.0;

Further, from what I've found sensors_15 to sensors_28 seem to be temperature measurements of sorts. The values reported at-least seem reasonable given the conversion factor I have found (all around ~46 degrees Celsius), but I havent found any information to which core/uncore/whatever they belong to, individually.

cvonelm commented 1 year ago

As there is now a Score-P plugin available for this task, that is also usable in lo2s, implementing this directly in lo2s is not needed anymore.

https://github.com/score-p/scorep_plugin_nec/