influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.7k stars 5.59k forks source link

`inputs.nvidia-smi`: Add config option to test a single run of nvidia-smi on plugin startup #15915

Closed LandonTClipp closed 1 month ago

LandonTClipp commented 1 month ago

Use Case

There are some cases where the nvidia-smi plugin might be found in PATH and executable, but upon running it might always return a non-zero exit code. For various reasons, in the environment I work in, this might be expected. It's thus disruptive for system logs to be polluted with infinite error messages. It's preferable in this situation to check if nvidia-smi returns a good result on plugin startup, and if not, allow the error to be bubbled up and handled according to startup_error_behavior.

Expected behavior

If test_on_startup = true and nvidia-smi returns non-zero exit code, an error should be returned from the NvidiaSMI.Start method.

Actual behavior

Config option does not exist, thus the proposal.

Additional info

No response