Closed metabsd closed 5 years ago
Can you help me to fix that please :)
I was on vacation :) so that's why I just saw this issue opened.
Anyway, I don't know what's wrong only by looking at your screenshot.
Did you already fix it? If not, can you give me the local logs from the zabbix-agent? You can get them by editing the conf file to change the verbosity and the log file path.
Do you only have the issue with the fan speeds?
If the issue still exists it might be worth looking at the raw output of nvidia-smi i.e.
nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits -i 0
I suspect the "[Not Supported]" is the output from nvidia-smi and its causing a parse error.
Hello, welcome back from vacation!
This is not a real problem but rather a misunderstanding on my part.
There is no FAN on this type of GPU :)
root@hostname:~# nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits -i 0
[Not Supported]
Case Close
In another subject.
We add that config to userparameter_nvidia-smi.conf
to have a metric with the average utilization of all GPU per server.
UserParameter=gpu.avg,nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits | /opt/local/bin/jq -s add/length | tr -d "\n"
Have a nice day!
One quick thing in regards to average utilisation of all GPUs per server is that given the original metrics definition it might not be very useful, depending on your purpose. At the very least it might not always react as expected. Nvidia-smi's definition of utilisation is:
unsigned int gpu - Percent of time over the past second during which one or more kernels was executing on the GPU.
It generally means that if the GPU is doing some work it will be either 0% or 100% usage (with occasional transitions in between) and that using 1 core in a GPU is the same as using all of them.
You can find this definition specified in the manual here.
Thx!