We aim to use torch.cuda interface in ts.metrics.system_metrics.collect_gpu_metrics() for amdsmi-related calls but a bug in torch.cuda is preventing us from that.
There exists a fix for this in upstream, which has been merged but is waiting to be released:
We aim to use
torch.cuda
interface ints.metrics.system_metrics.collect_gpu_metrics()
for amdsmi-related calls but a bug intorch.cuda
is preventing us from that.There exists a fix for this in upstream, which has been merged but is waiting to be released:
https://github.com/pytorch/pytorch/pull/140259
UPDATE: we may have to wait until 29.1.2025 when the release of 2.6.0 is scheduled, see PyTorch Release 2.6.0 | Call for features.