ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
180 stars 24 forks source link

Support for AMD GPUs #22

Open jaywonchung opened 9 months ago

jaywonchung commented 9 months ago

AMD GPUs are seeing increased adoption. ROCm has nice compatibility layers with PyTorch, too. Plus, ROCm-SMI (apparently) has all the energy-related management APIs we need -- measuring power and energy, and setting the power limit and GPU frequency.

First, we should evaluate whether the ROCm-SMI APIs behave like their NVML counterparts and show similar time/energy behavior with NVIDIA GPUs.

Then, implementatino-wise, ROCm-SMI lacks a Python package (something like nvidia-smi-py that NVIDIA officially provides). We should probably package that ourselves (much like the pynvml, which is a community-supported Python binding for NVML), until AMD provides an official binding.

jaywonchung commented 2 months ago

Progress