influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.11k stars 5.51k forks source link

nvidia-smi doesn't get encoder, decoder and other info #12996

Closed BrentonPoke closed 1 year ago

BrentonPoke commented 1 year ago

Relevant telegraf.conf

[[outputs.influxdb_v2]] 
  ## Point to your influxdb container
 urls = ["http://192.168.1.74:8086"]
 token = "mytoken"
 organization = "ProjectOWL"
 bucket = "sensors"

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false

[[inputs.mem]]

# Pulls statistics from nvidia GPUs attached to the host
[[inputs.nvidia_smi]]
  ## Optional: path to nvidia-smi binary, defaults to $PATH via exec.LookPath
 #bin_path = "$NVIDIASMI"
 tags = [
    "name",
    "compute_mode",
    ]
  ## Optional: timeout for GPU polling
  timeout = "5s"
[[inputs.exec]]
  commands = ["./CoreTempTelegraf"]
  timeout = "5s"    
  data_format = "influx"

[[inputs.diskio]]

[[inputs.disk]]

[[inputs.smart]]

[[inputs.internet_speed]]

Logs from Telegraf

2023-03-31T04:25:29Z I! Available plugins: 210 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-03-31T04:25:29Z I! Loaded inputs: cpu disk diskio exec internet_speed mem nvidia_smi smart
2023-03-31T04:25:29Z I! Loaded aggregators:
2023-03-31T04:25:29Z I! Loaded processors:
2023-03-31T04:25:29Z I! Loaded secretstores:
2023-03-31T04:25:29Z I! Loaded outputs: influxdb_v2
2023-03-31T04:25:29Z I! Tags enabled: host=Chevalier
2023-03-31T04:25:29Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"Chevalier", Flush Interval:10s
2023-03-31T04:25:29Z D! [agent] Initializing plugins
2023-03-31T04:25:29Z W! [inputs.smart] nvme not found: verify that nvme is installed and it is in your PATH (or specified in config) to gather vendor specific attributes: provided path does not exist: []
2023-03-31T04:25:29Z D! [agent] Connecting outputs
2023-03-31T04:25:29Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2023-03-31T04:25:29Z D! [agent] Successfully connected to outputs.influxdb_v2
2023-03-31T04:25:29Z D! [agent] Starting service inputs
2023-03-31T04:25:40Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-03-31T04:25:40Z D! [inputs.internet_speed] Found server: [40266]     6.36km
Lansing, MI (United States) by Metronet
2023-03-31T04:25:40Z D! [inputs.internet_speed] Starting Speed Test
2023-03-31T04:25:40Z D! [inputs.internet_speed] Running Ping...
2023-03-31T04:25:40Z D! [inputs.internet_speed] Running Download...
2023-03-31T04:25:43Z D! [inputs.internet_speed] Running Upload...
2023-03-31T04:25:45Z D! [inputs.internet_speed] Test finished.
2023-03-31T04:25:50Z D! [outputs.influxdb_v2] Wrote batch of 71 metrics in 19.9571ms
2023-03-31T04:25:50Z D! [outputs.influxdb_v2] Buffer fullness: 17 / 10000 metrics
2023-03-31T04:25:50Z D! [inputs.internet_speed] Found server: [40266]     6.36km
Lansing, MI (United States) by Metronet
2023-03-31T04:25:50Z D! [inputs.internet_speed] Starting Speed Test
2023-03-31T04:25:50Z D! [inputs.internet_speed] Running Ping...
2023-03-31T04:25:50Z D! [inputs.internet_speed] Running Download...

System info

telegraf-1.25.2, Windows 11

Steps to reproduce

  1. install nvidia drivers and any game ready driver updates
  2. run telegraf
  3. Do anything like watching a video on any site

Expected behavior

all stats should come in for every value.

Actual behavior

You won't get a result for any fps field. At least power draw works, though. Screenshot 2023-03-31 005704

Additional info

powersj commented 1 year ago

You won't get a result for any fps field.

Can you please provide the full output of nvidia-smi -q -x and specify the exact fields you are after?

BrentonPoke commented 1 year ago

You won't get a result for any fps field.

Can you please provide the full output of nvidia-smi -q -x and specify the exact fields you are after?

out.txt

powersj commented 1 year ago

I see the following in your file:

<encoder_stats>
    <session_count>0</session_count>
    <average_fps>0</average_fps>
    <average_latency>0</average_latency>
</encoder_stats>
<fbc_stats>
    <session_count>0</session_count>
    <average_fps>0</average_fps>
    <average_latency>0</average_latency>
</fbc_stats>

Are you looking for both average fps fields? something like:

encoder_stats_avgerage_fps fbc_stats_avgerage_fps

Looking at the code I see we try to gather those today:

setIfUsed("int", fields, "encoder_stats_average_fps", gpu.Encoder.AverageFPS) setIfUsed("int", fields, "fbc_stats_average_fps", gpu.FBC.AverageFPS)

The paths look correct. In your example output, both show zero? Are you sure the nvidia-smi command is ever reporting a non-zero value?

BrentonPoke commented 1 year ago

They're always zero now, and didn't always used to be. Last time I got a reading was a month ago on the 27th.

powersj commented 1 year ago

They're always zero now, and didn't always used to be. Last time I got a reading was a month ago on the 27th.

If the nvidia-smi command is returning zero, what can Telegraf do about it? This sounds like something changed with the nvidia-smi command, a driver, or something higher up. Telegraf can only read what it is given.

telegraf-tiger[bot] commented 1 year ago

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!