influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.69k stars 5.59k forks source link

Update the vSphere / VSAN Extention with a Variable for the Polling Intervall ( New Feature in vSAN 8 U1 , 30sec ) #13880

Closed Muy69 closed 1 year ago

Muy69 commented 1 year ago

Use Case

In a vSAN ESA Cluster ( vSAN 8 U1 ) the usage of a Polling Intervall of 30Sec helps greatly in peak Situation , as the 5Minute is too long for a good Look inside the Storage Data .

https://core.vmware.com/blog/high-resolution-performance-monitoring-vsan-8-u1

Expected behavior

Some kind of variable or a Flag to determain that an vSAN 8 U1 or above is used .

Actual behavior

Currently a hard coded 300s Intervall is set. ( endpoint.go , Line 248 )

"vsan": { name: "vsan", vcName: "ClusterComputeResource", pKey: "clustername", parentTag: "dcname", enabled: anythingEnabled(parent.VSANMetricExclude), realTime: false, sampling: 300, objects: make(objectMap), filters: newFilterOrPanic(parent.VSANMetricInclude, parent.VSANMetricExclude), paths: parent.VSANClusterInclude, simple: parent.VSANMetricSkipVerify, include: parent.VSANMetricInclude, collectInstances: false, getObjects: getClusters, parent: "datacenter",

Additional info

If you need testing , I would like to offer my help .

vSAN 8 u1 ( VSAN ESA )

srebhan commented 1 year ago

@Muy69 please test the binary in PR #13890 available as soon as CI finished its tests successfully. Let me know if this solves your issue!

Muy69 commented 1 year ago

@srebhan Im having Problem in starting Telegraf . Errorcode below

# # Read metrics form VMware vCenter

[[inputs.vsphere]]
  ## List of vCenter URLs to be monitored. These three lines must be uncommented
   interval = "30s"
   vcenters = [ "" ]
   username = ""
   password = ""
   timeout = "29s"
   insecure_skip_verify =true

   # Exclude all historical metrics

   datastore_metric_exclude = ["*"]
   datacenter_metric_exclude = ["*"]
   cluster_metric_exclude = ["*"]
   resourcepool_metric_exclude = ["*"]

   vsan_metric_include = [ "summary.*" ]
   vsan_metric_exclude = [ ]
   vsan_cluster_include = ["/*/host/WLC-120"]
   #vsan_interval = "30s"

   collect_concurrency = 4
   discover_concurrency = 4

  ## The Historical Interval value must match EXACTLY the interval in the daily
  # "Interval Duration" found on the VCenter server under Configure > General > Statistics > Statistic intervals
   historical_interval = "60s"

[[inputs.vsphere]]

  interval = "60s"
  vcenters = [ "" ]
  username = ""
  password = ""
  timeout = "59s"
  insecure_skip_verify = true

  vm_metric_exclude = ["*"] # Exclude realtime metrics
  host_metric_exclude = ["*"] # Exclude realtime metrics

  vsan_metric_include = [ "performance.*" ]
  vsan_metric_exclude = [ ]
  vsan_cluster_include = ["/*/host/WLC-120" ]
  vsan_interval = "30s"

  #max_query_metrics = 256
  discover_concurrency = 4
  collect_concurrency = 4

  ## The Historical Interval value must match EXACTLY the interval in the daily
  # "Interval Duration" found on the VCenter server under Configure > General > Statistics > Statistic intervals
  historical_interval = "60s"
 Process: 2067 ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=203/EXEC)
   Main PID: 2067 (code=exited, status=203/EXEC)
        CPU: 2ms

Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Main process exited, code=exited, status=203/EXEC
Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Failed with result 'exit-code'.
Sep 10 18:36:20 cidtsttele01 systemd[1]: Failed to start Telegraf.
Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5.
Sep 10 18:36:20 cidtsttele01 systemd[1]: Stopped Telegraf.
Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Start request repeated too quickly.
Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Failed with result 'exit-code'.
Sep 10 18:36:20 cidtsttele01 systemd[1]: Failed to start Telegraf.
Muy69 commented 1 year ago

The Config changes dont work as intended , as the Interval can be set but the Metrics still reflect only a 300 Sec Time Interval ?

powersj commented 1 year ago

@Muy69 please provide the actual telegraf logs. If you are running it as a service then you need to use something similar to the below:

journalctl --no-pager --unit telegraf
Muy69 commented 1 year ago

Telegraf Version : 1.28.1-1

journalct --no-pager --unit telegraf

Sep 15 12:33:14 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:14Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: cmmds-workload: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:33:15 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:15Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:15 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:15Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: host-pmem: ServerFaultCode: The operation is not supported on the object.. Sep 15 12:33:15 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:15Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: cluster-pmem: ServerFaultCode: The operation is not supported on the object.. Sep 15 12:33:18 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:19 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:19Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: clom-host: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:21 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:21Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: clom-disk: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:33:21 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:21Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:22 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:22Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: computeCluster-remotedomclient: ServerFaultCode: A specified parameter was not correct: computeCluster-remotedomclient:. Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: ddh-disk: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: slab-memory: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:34:08 cidtsttele02 telegraf[4857]: 2023-09-15T12:34:08Z I! [inputs.vsphere] Stopping plugin Sep 15 12:34:08 cidtsttele02 systemd[1]: Stopping Telegraf... Sep 15 12:34:08 cidtsttele02 telegraf[4857]: 2023-09-15T12:34:08Z I! [agent] Hang on, flushing any cached metrics before shutdown Sep 15 12:34:08 cidtsttele02 telegraf[4857]: 2023-09-15T12:34:08Z I! [agent] Stopping running outputs Sep 15 12:34:08 cidtsttele02 systemd[1]: telegraf.service: Deactivated successfully. Sep 15 12:34:08 cidtsttele02 systemd[1]: Stopped Telegraf. Sep 15 12:34:08 cidtsttele02 systemd[1]: telegraf.service: Consumed 13min 26.291s CPU time. Sep 15 12:34:08 cidtsttele02 systemd[1]: Starting Telegraf... Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loading config: /etc/telegraf/telegraf.conf Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z W! DeprecationWarning: Option "force_discover_on_init" of plugin "inputs.vsphere" deprecated since version 1.14.0 and will be removed in 2.0.0: option is ignored Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z E! Unable to open /var/log/telegraf/telegraf.log (open /var/log/telegraf/telegraf.log: permission denied), using stderr Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Starting Telegraf 1.28.1 brought to you by InfluxData the makers of InfluxDB Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Available plugins: 240 inputs, 9 aggregators, 29 processors, 24 parsers, 59 outputs, 5 secret-stores Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded inputs: cpu vsphere Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded aggregators: Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded processors: Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded secretstores: Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded outputs: influxdb Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Tags enabled: host=cidtsttele02 Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z W! Deprecated inputs: 0 and 1 options Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"cidtsttele02", Flush Interval:10s Sep 15 12:34:08 cidtsttele02 systemd[1]: Started Telegraf. Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! [inputs.vsphere] Starting plugin Sep 15 12:34:09 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:09Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: cluster-pmem: ServerFaultCode: The operation is not supported on the object.. Sep 15 12:34:09 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:09Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: ddh-disk: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:34:11 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:11Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:11 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:11Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:12 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:12Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:12 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:12Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: computeCluster-remotedomclient: ServerFaultCode: A specified parameter was not correct: computeCluster-remotedomclient:. Sep 15 12:34:12 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:12Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:12 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:12Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:13 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:13Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:15 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:15Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:15 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:15Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: host-pmem: ServerFaultCode: The operation is not supported on the object.. Sep 15 12:34:15 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:15Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:16 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:16Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: cmmds-workload: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:34:16 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:16Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:16 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:16Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:17 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:17Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: clom-host: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:34:17 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:17Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: clom-disk: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: heap-memory: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: slab-memory: ServerFaultCode: A specified parameter was not correct: entityRefId. Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping

#

Read metrics form VMware vCenter

[[inputs.vsphere]] interval = "120s" vcenters = [ "https username = " password = " timeout = "110s"

insecure_skip_verify = true

force_discover_on_init = true

vm_metric_exclude = [""] # Exclude realtime metrics host_metric_exclude = [""] # Exclude realtime metrics datastore_metric_exclude = [""] datacenter_metric_exclude = [""] cluster_metric_exclude = [""] resource_pool_metric_exclude = [""]

vsan_metric_include = [ "performance.*" ]

vsan_metric_skip_verify = true

vsan_metric_exclude = [ ]

vsan_cluster_include = ["/*/host/WLC-120" ] vsan_interval = "30s"

max_query_metrics = 256

discover_concurrency = 5 collect_concurrency = 5

historical_interval = "60s"

Muy69 commented 1 year ago

Could it be that there is another Entry with 300 hard coded Timerange ?

vsphere.go ( Line 184 )

func init() { inputs.Add("vsphere", func() telegraf.Input { return &VSphere{ DatacenterInclude: []string{"/"}, ClusterInclude: []string{"//host/"}, HostInstances: true, HostInclude: []string{"/*/host/*"}, ResourcePoolInclude: []string{"//host/"}, VMInstances: true, VMInclude: []string{"/*/vm/"}, DatastoreInclude: []string{"/*/datastore/"}, VSANMetricExclude: []string{""}, VSANClusterInclude: []string{"//host/"}, Separator: "_", CustomAttributeExclude: []string{""}, UseIntSamples: true, MaxQueryObjects: 256, MaxQueryMetrics: 256, CollectConcurrency: 1, DiscoverConcurrency: 1, MetricLookback: 3, ForceDiscoverOnInit: true, ObjectDiscoveryInterval: config.Duration(time.Second 300), Timeout: config.Duration(time.Second 60), HistoricalInterval: config.Duration(time.Second 300), *VSANInterval: config.Duration(time.Second 300), DisconnectedServersBehavior: "error", HTTPProxy: proxy.HTTPProxy{UseSystemProxy: true},

Muy69 commented 1 year ago

Is that info sufficient ? @powersj

The current govmomi supports now vsphere up to 8u1c . https://github.com/vmware/govmomi/issues/3193