Closed justinsteven closed 6 months ago
@justinsteven thanks for your request and the provided data! Please test the binary in PR #15282, available as soon as CI finished the tests, and let me know if this fixes the issue for you!
@srebhan
% ./usr/bin/telegraf --config /etc/telegraf/telegraf.conf --test 2>&1 | grep -Fi nvidia
2024-05-03T10:54:24Z I! Loaded inputs: [... SNIP ...] nvidia_smi [... SNIP ...]
> nvidia_smi,arch=Ampere,compute_mode=Default,host=[REDACTED],index=0,name=NVIDIA\ GeForce\ RTX\ 3090,pstate=P8,uuid=GPU-[REDACTED] clocks_current_graphics=0i,clocks_current_memory=405i,clocks_current_sm=0i,clocks_current_video=555i,cuda_version="12.0",display_active="Disabled",display_mode="Disabled",driver_version="525.147.05",encoder_stats_average_fps=0i,encoder_stats_average_latency=0i,encoder_stats_session_count=0i,fan_speed=0i,fbc_stats_average_fps=0i,fbc_stats_average_latency=0i,fbc_stats_session_count=0i,memory_free=24258i,memory_reserved=316i,memory_total=24576i,memory_used=1i,pcie_link_gen_current=1i,pcie_link_width_current=16i,power_draw=27.48,power_limit=350,temperature_gpu=35i,utilization_decoder=0i,utilization_encoder=0i,utilization_gpu=0i,utilization_memory=0i,vbios_version="[REDACTED]" 1714733665000000000
% sudo nvidia-smi -pl 210
Power limit for GPU 00000000:00:10.0 was set to 210.00 W from 350.00 W.
All done.
% ./usr/bin/telegraf --config /etc/telegraf/telegraf.conf --test 2>&1 | grep -Fi nvidia
2024-05-03T10:54:47Z I! Loaded inputs: [... SNIP ...] nvidia_smi [... SNIP ...]> nvidia_smi,arch=Ampere,compute_mode=Default,host=[REDACTED],index=0,name=NVIDIA\ GeForce\ RTX\ 3090,pstate=P8,uuid=GPU-[REDACTED] clocks_current_graphics=0i,clocks_current_memory=405i,clocks_current_sm=0i,clocks_current_video=555i,cuda_version="12.0",display_active="Disabled",display_mode="Disabled",driver_version="525.147.05",encoder_stats_average_fps=0i,encoder_stats_average_latency=0i,encoder_stats_session_count=0i,fan_speed=0i,fbc_stats_average_fps=0i,fbc_stats_average_latency=0i,fbc_stats_session_count=0i,memory_free=24258i,memory_reserved=316i,memory_total=24576i,memory_used=1i,pcie_link_gen_current=1i,pcie_link_width_current=16i,power_draw=27.42,power_limit=210,temperature_gpu=35i,utilization_decoder=0i,utilization_encoder=0i,utilization_gpu=0i,utilization_memory=0i,vbios_version="[REDACTED]" 1714733687000000000
Perfect, thank you!
Use Case
Monitor power limit attribute of Nvidia GPU
Expected behavior
Output for
nvidia_smi
plugin should includepower_limit
data for schema v12 like it does for v11 (#15144)Actual behavior
nvidia_smi
lackspower_limit
dataNote that some of the following has been redacted
Additional info
See also #15142
Example
nvidia-smi
XML for v12 (note that some parts have been redacted as I don't know what information it might be leaking):