AMD GPUs are seeing increased adoption. ROCm has nice compatibility layers with PyTorch, too. Plus, ROCm-SMI (apparently) has all the energy-related management APIs we need -- measuring power and energy, and setting the power limit and GPU frequency.
First, we should evaluate whether the ROCm-SMI APIs behave like their NVML counterparts and show similar time/energy behavior with NVIDIA GPUs.
Then, implementatino-wise, ROCm-SMI lacks a Python package (something like nvidia-smi-py that NVIDIA officially provides). We should probably package that ourselves (much like the pynvml, which is a community-supported Python binding for NVML), until AMD provides an official binding.
AMD GPUs are seeing increased adoption. ROCm has nice compatibility layers with PyTorch, too. Plus, ROCm-SMI (apparently) has all the energy-related management APIs we need -- measuring power and energy, and setting the power limit and GPU frequency.
First, we should evaluate whether the ROCm-SMI APIs behave like their NVML counterparts and show similar time/energy behavior with NVIDIA GPUs.
Then, implementatino-wise, ROCm-SMI lacks a Python package (something like
nvidia-smi-py
that NVIDIA officially provides). We should probably package that ourselves (much like thepynvml
, which is a community-supported Python binding for NVML), until AMD provides an official binding.