ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
179 stars 24 forks source link

Question about the missing of Graphics Clock Setting #48

Closed FuryMartin closed 2 months ago

FuryMartin commented 2 months ago

Hi, thanks for building this project, which is a wonderful tool to monitor and optimize GPU energy consumption.

I have a question about clock setting and wish to have a discussion. I carefully checked the code and found Zeus implementing frequency control in zeus/optimizer/perseus/frequency_controller.py. However, it seems that there are only Memory Clock settings.

I am wondering why Graphics Clock setting is missing:

jaywonchung commented 2 months ago

Hi @FuryMartin!

The Perseus optimizer controls the GPU primarily in terms of its graphics clock: https://github.com/ml-energy/zeus/blob/b0c65a4e8a2271e34f20c9a40949b01b676ceaa5/zeus/optimizer/perseus/frequency_controller.py#L75

The corresponding NVML API is documented here.

Memory clock can in general be ignored, at least for the GPUs we have dealt with. For instance, A100 supports only one memory clock frequency, and A40 supports two, but one is basically the default and the other is almost zero, putting memory into a pseudo-sleep state. In any case, we're just making it sure memory is running at the highest frequency at all times, and computation time and energy is controlled by setting the graphics clock.

FuryMartin commented 2 months ago

Thanks for your thorough explanation, sorry for having a misunderstanding about the clock control.

I may have been misled by the parameter names minMemClockMHz and maxMemClockMHz in zeus/device/gpu.py, which seems to be setting the Memory Clock:

    @abc.abstractmethod
    def setGpuLockedClocks(
        self, index: int, minMemClockMHz: int, maxMemClockMHz: int
    ) -> None:
        """Lock the GPU clock to a specified range. Units: MHz."""
jaywonchung commented 2 months ago

Oh, you're right. Those parameters have wrong names. Thanks a lot for catching those! Would you mind sending a quick PR to fix the name so that they're aligned with the NVML API? If you don't have the bandwidth, I can quickly fix them.

FuryMartin commented 2 months ago

It's an honor to do it. I will finish it quickly.😊