Closed FuryMartin closed 2 months ago
Hi @FuryMartin!
The Perseus optimizer controls the GPU primarily in terms of its graphics clock: https://github.com/ml-energy/zeus/blob/b0c65a4e8a2271e34f20c9a40949b01b676ceaa5/zeus/optimizer/perseus/frequency_controller.py#L75
The corresponding NVML API is documented here.
Memory clock can in general be ignored, at least for the GPUs we have dealt with. For instance, A100 supports only one memory clock frequency, and A40 supports two, but one is basically the default and the other is almost zero, putting memory into a pseudo-sleep state. In any case, we're just making it sure memory is running at the highest frequency at all times, and computation time and energy is controlled by setting the graphics clock.
Thanks for your thorough explanation, sorry for having a misunderstanding about the clock control.
I may have been misled by the parameter names minMemClockMHz
and maxMemClockMHz
in zeus/device/gpu.py
, which seems to be setting the Memory Clock:
@abc.abstractmethod
def setGpuLockedClocks(
self, index: int, minMemClockMHz: int, maxMemClockMHz: int
) -> None:
"""Lock the GPU clock to a specified range. Units: MHz."""
Oh, you're right. Those parameters have wrong names. Thanks a lot for catching those! Would you mind sending a quick PR to fix the name so that they're aligned with the NVML API? If you don't have the bandwidth, I can quickly fix them.
It's an honor to do it. I will finish it quickly.😊
Hi, thanks for building this project, which is a wonderful tool to monitor and optimize GPU energy consumption.
I have a question about clock setting and wish to have a discussion. I carefully checked the code and found Zeus implementing frequency control in
zeus/optimizer/perseus/frequency_controller.py
. However, it seems that there are onlyMemory Clock
settings.I am wondering why
Graphics Clock
setting is missing:pynvml
? (I have checkedpynvml
and found nothing about Graphics Clock setting function, so I guess this might be a possible reason?)