utkuozdemir / nvidia_gpu_exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary
MIT License
902 stars 107 forks source link

The dashboard and the exporter use completely different metic-names? #245

Open sorenwacker opened 3 weeks ago

sorenwacker commented 3 weeks ago

When I use the exporter like this

BootStrap: docker
From: utkuozdemir/nvidia_gpu_exporter:1.1.0

%environment
    export NVIDIA_VISIBLE_DEVICES=all

%post
    # Ensure necessary permissions are set
    chmod +x /usr/bin/nvidia_gpu_exporter

%startscript
    exec /usr/bin/nvidia_gpu_exporter

together with the dashboard:

https://grafana.com/grafana/dashboards/19172-nvidia-gpu-metrics/

I get:

image

It seems there is a mismatch between the exported metrics and the ones expected by the dashboard.

utkuozdemir commented 3 weeks ago

This looks more like there's an issue with the data source, as none of the metrics are populated. Can you check the service logs?

Also pls try to hit the exporter's metrics endpoint in your browser (http://<host>:9835/metrics) and see two things:

you can post them here, they can be helpful.

utkuozdemir commented 3 weeks ago

Another possibility is, nvidia-smi binary not working in the container for some reason. If you find out that is the case (let's say from the logs), you can execute a shell in the container, try to run nvidia-smi manually to debug.