plambe / zabbix-nvidia-smi-multi-gpu

A zabbix template using nvidia-smi. Works with multiple GPUs on Windows and Linux.
Other
99 stars 54 forks source link

Decoder/Encoder monitoring is missing in windows zabbix config #26

Open jannoke opened 9 months ago

jannoke commented 9 months ago
UserParameter=gpu.utilization.dec.min[*].....
UserParameter=gpu.utilization.dec.max[*].....
UserParameter=gpu.utilization.enc.min[*].....
UserParameter=gpu.utilization.enc.max[*].....

are missing for windows hosts and will yield as "unsupported item" in zabbix monitoring

Desimonde commented 9 months ago

I created a solution add these lines to your zabbix config file

UserParameter=gpu.utilization.dec.min[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1
UserParameter=gpu.utilization.dec.max[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1
UserParameter=gpu.utilization.enc.min[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1
UserParameter=gpu.utilization.enc.max[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1

Then: image in your template under discovery item prototypes update each decode/encode min/max with these preprocessing rules Regular expression: Pattern:

DEC\s+Utilization\s+Samples[\s\S]*?Max\s*:\s*(\d+)
DEC\s+Utilization\s+Samples[\s\S]*?Min\s*:\s*(\d+)
ENC\s+Utilization\s+Samples[\s\S]*?Max\s*:\s*(\d+)
ENC\s+Utilization\s+Samples[\s\S]*?Min\s*:\s*(\d+)

Output for all the Above \1

then create Replace: Search String: \1

make sure you leave it as Numeric Unsigned Example: image

The Template Creator can prob clean this up a bit since you only need one output to gather all 4 info points.

jannoke commented 9 months ago

Thanks. It would be really easy - just make one more value: gpu.utilization.encdec[{#GPUINDEX}]

In zabbix agent config just use single line instead of four: UserParameter=gpu.utilization.encdec[*],"nvidia-smi.exe" -q -d UTILIZATION -i $1

All other items to dependent item and add this new value as master ("select prototype"). Sample for max: image

Overview: image

Desimonde commented 9 months ago

Thanks for your help I have updated my template,

also,

I created another set of data for GPU Clock Speeds since I need to monitor those If you wanted the Regular Expressions

Add to your Zabbix config file: UserParameter=gpu.clock.master[*],"nvidia-smi.exe" -q -d CLOCK -i $1

Create Item Prototypes following how the others are setup, just change your units to Hz

Graphics\s*:\s*(\d+)\s*MHz
SM\s*:\s*(\d+)\s*MHz
Memory\s*:\s*(\d+)\s*MHz
Video\s*:\s*(\d+)\s*MHz

and in preprocessing add a multiplier to bring hz to Megahertz, (I had to do it this way so when you jump to Gigahertz it displays correctly. )

image

If you wanted, I also setup a graph for these 4 values.

Info: image

Sample: image image

plambe commented 8 months ago

@jannoke, @Desimonde, feel free to create a pull request about this

denixx commented 6 months ago

Hello! I reworked the template, using 3 Windows PCs to test on.

Added this to zabbix_agentd.conf in addition to the present parameters. UserParameter=gpu.utilization.report[*],"nvidia-smi.exe" -q -d UTILIZATION,CLOCK -i $1

Resulting zabbix_agentd.conf is like this:

# https://github.com/plambe/zabbix-nvidia-smi-multi-gpu
UserParameter=gpu.number,"nvidia-smi.exe" -L | find /c /v ""
UserParameter=gpu.discovery,C:\scripts\get_gpus_info.bat
UserParameter=gpu.fanspeed[*],"nvidia-smi.exe" --query-gpu=fan.speed --format=csv,noheader,nounits -i $1
UserParameter=gpu.power[*],"nvidia-smi.exe" --query-gpu=power.draw --format=csv,noheader,nounits -i $1
UserParameter=gpu.temp[*],"nvidia-smi.exe" --query-gpu=temperature.gpu --format=csv,noheader,nounits -i $1
UserParameter=gpu.utilization[*],"nvidia-smi.exe" --query-gpu=utilization.gpu --format=csv,noheader,nounits -i $1
UserParameter=gpu.memfree[*],"nvidia-smi.exe" --query-gpu=memory.free --format=csv,noheader,nounits -i $1
UserParameter=gpu.memused[*],"nvidia-smi.exe" --query-gpu=memory.used --format=csv,noheader,nounits -i $1
UserParameter=gpu.memtotal[*],"nvidia-smi.exe" --query-gpu=memory.total --format=csv,noheader,nounits -i $1
UserParameter=gpu.utilization.report[*],"nvidia-smi.exe" -q -d UTILIZATION,CLOCK -i $1

Also, I hardly changed the template itself. I am on Zabbix 6.4, so I exported from there. Different_GPU_and_driver_versions_reports.zip zbx_nvidia-smi-multi-gpu-by-report-regexps-1.zip zbx_nvidia-smi-multi-gpu-by-report_v3_wo_uuid_active_xml.zip zbx_nvidia-smi-multi-gpu-by-report_v3_wo_uuid_active_yaml.zip

The main idea here is that I used one report to process many items from it. Also, I corrected the memory items - there was a multiplier of 1000000 and "b", but nvidia-smi reports memory values as MiBs, so the multiplier should be 1024*1024=1048576 - this leads to correct values. Also, I assumed that clock graphs should be separate for (current+max); there are 4 graphs for that. Values from samples are also divided by triplets - (max+avg+min). Values for "General" utilization are grouped in one graph. Also, I have added some descriptions to the items.

Please take a look and tag me if you have any import problems. :)

image image image image image