Open jannoke opened 9 months ago
I created a solution add these lines to your zabbix config file
UserParameter=gpu.utilization.dec.min[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1
UserParameter=gpu.utilization.dec.max[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1
UserParameter=gpu.utilization.enc.min[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1
UserParameter=gpu.utilization.enc.max[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1
Then: in your template under discovery item prototypes update each decode/encode min/max with these preprocessing rules Regular expression: Pattern:
DEC\s+Utilization\s+Samples[\s\S]*?Max\s*:\s*(\d+)
DEC\s+Utilization\s+Samples[\s\S]*?Min\s*:\s*(\d+)
ENC\s+Utilization\s+Samples[\s\S]*?Max\s*:\s*(\d+)
ENC\s+Utilization\s+Samples[\s\S]*?Min\s*:\s*(\d+)
Output for all the Above \1
then create Replace: Search String: \1
make sure you leave it as Numeric Unsigned Example:
The Template Creator can prob clean this up a bit since you only need one output to gather all 4 info points.
Thanks. It would be really easy - just make one more value: gpu.utilization.encdec[{#GPUINDEX}]
In zabbix agent config just use single line instead of four:
UserParameter=gpu.utilization.encdec[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1
All other items to dependent item and add this new value as master ("select prototype"). Sample for max:
Overview:
Thanks for your help I have updated my template,
also,
I created another set of data for GPU Clock Speeds since I need to monitor those If you wanted the Regular Expressions
Add to your Zabbix config file:
UserParameter=gpu.clock.master[*],"nvidia-smi.exe" -q -d CLOCK -i $1
Create Item Prototypes following how the others are setup, just change your units to Hz
Graphics\s*:\s*(\d+)\s*MHz
SM\s*:\s*(\d+)\s*MHz
Memory\s*:\s*(\d+)\s*MHz
Video\s*:\s*(\d+)\s*MHz
and in preprocessing add a multiplier to bring hz to Megahertz, (I had to do it this way so when you jump to Gigahertz it displays correctly. )
If you wanted, I also setup a graph for these 4 values.
Info:
Sample:
@jannoke, @Desimonde, feel free to create a pull request about this
Hello! I reworked the template, using 3 Windows PCs to test on.
Added this to zabbix_agentd.conf
in addition to the present parameters.
UserParameter=gpu.utilization.report[*],"nvidia-smi.exe" -q -d UTILIZATION,CLOCK -i $1
Resulting zabbix_agentd.conf
is like this:
# https://github.com/plambe/zabbix-nvidia-smi-multi-gpu
UserParameter=gpu.number,"nvidia-smi.exe" -L | find /c /v ""
UserParameter=gpu.discovery,C:\scripts\get_gpus_info.bat
UserParameter=gpu.fanspeed[*],"nvidia-smi.exe" --query-gpu=fan.speed --format=csv,noheader,nounits -i $1
UserParameter=gpu.power[*],"nvidia-smi.exe" --query-gpu=power.draw --format=csv,noheader,nounits -i $1
UserParameter=gpu.temp[*],"nvidia-smi.exe" --query-gpu=temperature.gpu --format=csv,noheader,nounits -i $1
UserParameter=gpu.utilization[*],"nvidia-smi.exe" --query-gpu=utilization.gpu --format=csv,noheader,nounits -i $1
UserParameter=gpu.memfree[*],"nvidia-smi.exe" --query-gpu=memory.free --format=csv,noheader,nounits -i $1
UserParameter=gpu.memused[*],"nvidia-smi.exe" --query-gpu=memory.used --format=csv,noheader,nounits -i $1
UserParameter=gpu.memtotal[*],"nvidia-smi.exe" --query-gpu=memory.total --format=csv,noheader,nounits -i $1
UserParameter=gpu.utilization.report[*],"nvidia-smi.exe" -q -d UTILIZATION,CLOCK -i $1
Also, I hardly changed the template itself. I am on Zabbix 6.4, so I exported from there. Different_GPU_and_driver_versions_reports.zip zbx_nvidia-smi-multi-gpu-by-report-regexps-1.zip zbx_nvidia-smi-multi-gpu-by-report_v3_wo_uuid_active_xml.zip zbx_nvidia-smi-multi-gpu-by-report_v3_wo_uuid_active_yaml.zip
The main idea here is that I used one report to process many items from it. Also, I corrected the memory items - there was a multiplier of 1000000 and "b", but nvidia-smi reports memory values as MiBs, so the multiplier should be 1024*1024=1048576 - this leads to correct values. Also, I assumed that clock graphs should be separate for (current+max); there are 4 graphs for that. Values from samples are also divided by triplets - (max+avg+min). Values for "General" utilization are grouped in one graph. Also, I have added some descriptions to the items.
Please take a look and tag me if you have any import problems. :)
are missing for windows hosts and will yield as "unsupported item" in zabbix monitoring