ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
180 stars 24 forks source link

Cluster-wide energy metric aggregation #30

Open jaywonchung opened 8 months ago

jaywonchung commented 8 months ago

With multiple jobs tracking their energy consumption with the ZeusMonitor, it would be nice to be able to aggregate time/energy metrics to Prometheus. The metric name should be derived from the window name.

One way to do this is to just add this as a feature in ZeusMonitor, which will increase its complexity and potentially add a dependency even for people who're not using Prometheus metric export. Another way could be to augment the Measurement object by adding it the name of the window and implementing a simple adaptor that can be connected to a metric exporter library.