ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
180 stars 24 forks source link

Transition to Python-based power monitor #15

Closed jaywonchung closed 10 months ago

jaywonchung commented 11 months ago

Currently, the independent Zeus monitor binary is written in C++ (under zeus_monitor). However, because NVML's power/energy counters do not update that quickly anyway, there's no reason to use C++ anyway. Rather, using C++ requires us to use the Docker CUDA devel variant images, inflating image size and sometimes even killing our GitHub action for building and pushing out Docker image.

We should just switch to a simple Python implementation merged in to zeus/monitor.py that still outputs the same CSV file as its previous C++ implementation.

Initial plans

jaywonchung commented 11 months ago

It would also be nice if the Python-based monitor can detect the NVML counter update frequency and poll just slightly faster than that, so that whatever overhead it's adding is minimal.

jaywonchung commented 10 months ago

Implemented by 2dbd5a651df4b615dc67e40c052e0496d3ab1735