Closed jaywonchung closed 10 months ago
It would also be nice if the Python-based monitor can detect the NVML counter update frequency and poll just slightly faster than that, so that whatever overhead it's adding is minimal.
Implemented by 2dbd5a651df4b615dc67e40c052e0496d3ab1735
Currently, the independent Zeus monitor binary is written in C++ (under
zeus_monitor
). However, because NVML's power/energy counters do not update that quickly anyway, there's no reason to use C++ anyway. Rather, using C++ requires us to use the Docker CUDAdevel
variant images, inflating image size and sometimes even killing our GitHub action for building and pushing out Docker image.We should just switch to a simple Python implementation merged in to
zeus/monitor.py
that still outputs the same CSV file as its previous C++ implementation.Initial plans
python -m zeus.monitor
should work as a standalone power monitor binary. It would be nice for it to have the ASCII graph frontend that I used in the NSDI presentation live demo.zeus/monitor.py
./zeus_monitor
GlobalPowerLimitOptimizer
works well with polling-based Python monitor.