GPU Nvidia H100 PCIe Not Supported

I've deployed Kepler on a Kubernetes to monitor a cluster with a GPU node with a NIVIDIA H100 PCIe.

In the kepler logs from this node, I've this error. In parallel I'm monitoring this GPU with a dgcm-exporter instance and it can collect gpu energy consumption metrics correctly.

I0125 07:04:48.972351 1 power.go:86] Failed to collect GPU metrics, trying to initizalize again: failed to get processes' utilization on device {0x7f639b40bdf8}: Not Supported I0125 07:04:48.972407 1 gpu_nvml.go:62] found 1 gpu devices I0125 07:04:48.972416 1 gpu_nvml.go:73] GPU 0 NVIDIA H100 PCIe

Do you have an idea ?

sustainable-computing-io / kepler-doc

GPU Nvidia H100 PCIe Not Supported #135