jorgedlcruz / vmware-grafana

A simple way to retrieve vCenter information and send it to InfluxDB, to present it later with Grafana
MIT License
185 stars 49 forks source link

There are missing data in the Dashboard (Cluster & Datastore) #26

Closed oijkn closed 3 years ago

oijkn commented 4 years ago

Relevant telegraf.conf:

## Realtime instance
[[inputs.vsphere]]
## List of vCenter URLs to be monitored. These three lines must be uncommented
## and edited for the plugin to work.
  interval = "20s"
  vcenters = [ "https://10.1.x.x/sdk" ]
  username = "user@vsphere.local"
  password = "pass"

  vm_metric_include = []
  host_metric_include = []
  cluster_metric_include = []
  datastore_metric_exclude = ["*"]

  max_query_metrics = 256
  timeout = "120s"
  insecure_skip_verify = true

## Historical instance
[[inputs.vsphere]]
  interval = "300s"
  vcenters = [ "https://10.1.x.x/sdk" ]
  username = "user@vsphere.local"
  password = "pass"

  datastore_metric_include = [ "disk.capacity.latest", "disk.used.latest", "disk.provisioned.latest"]
  insecure_skip_verify = true
  force_discover_on_init = true
  host_metric_exclude = ["*"] # Exclude realtime metrics
  vm_metric_exclude = ["*"] # Exclude realtime metrics

  max_query_metrics = 256
  collect_concurrency = 3

System info:

[root@centos]: $ cat /etc/*-release | grep "VERSION="
VERSION="8 (Core)"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

[root@centos]: $ telegraf --version
Telegraf 1.15.3 (git: HEAD fac81815)

[root@centos]: $ grafana-server -v
Version 7.2.0 (commit: efe4941ee3, branch: HEAD)

[root@centos]: $ influx --version
InfluxDB shell version: 1.8.3

Expected behavior:

Actual behavior:

Additional info:

Query for number of cluster: SELECT count(distinct("clustername")) AS "Cluster" FROM (SELECT "totalmhz_average", "clustername" FROM "vsphere_cluster_cpu" WHERE $timeFilter)

Query for ReadAverage & WriteAverage: SELECT mean("numberReadAveraged_average") FROM "vsphere_datastore_datastore" WHERE ("source" =~ /^$datastore$/) AND $timeFilter GROUP BY time(5m), "source" fill(none)

Connected to http://localhost:8086 version 1.8.3
InfluxDB shell version: 1.8.3
> use telegraf
Using database telegraf
> show measurements
name: measurements
name
----
vsphere_cluster_clusterServices
vsphere_cluster_mem
vsphere_cluster_vmop
vsphere_datacenter_vmop
vsphere_datastore_disk
vsphere_host_cpu
vsphere_host_datastore
vsphere_host_disk
vsphere_host_hbr
vsphere_host_mem
vsphere_host_net
vsphere_host_power
vsphere_host_rescpu
vsphere_host_storageAdapter
vsphere_host_storagePath
vsphere_host_sys
vsphere_host_vflashModule
vsphere_vm_cpu
vsphere_vm_datastore
vsphere_vm_disk
vsphere_vm_mem
vsphere_vm_net
vsphere_vm_power
vsphere_vm_rescpu
vsphere_vm_sys
vsphere_vm_virtualDisk

As you can see no measurement exists with the name _vsphere_clustercpu or _vsphere_datastoredatastore.

Can you look on your side and fix this please ? Gracias por tu ayuda :)

BR.

jorgedlcruz commented 4 years ago

Hello, I am not sure which Dashboards are affected, but it is all working fine on my side, can you share some screenshots of the affected dashboards, do you have the latest versions from grafana.com dashboards?

oijkn commented 4 years ago

Hello, I am not sure which Dashboards are affected, but it is all working fine on my side, can you share some screenshots of the affected dashboards, do you have the latest versions from grafana.com dashboards?

Hello,

Yes I'm using the last version of Grafana as I indicated in my previous post :

[root@centos]: $ grafana-server -v
Version 7.2.0 (commit: efe4941ee3, branch: HEAD)

image

jorgedlcruz commented 4 years ago

Hello, Seems you are not running the last versions of the Dashboards, please download the new ones from here: https://grafana.com/grafana/dashboards/8159 https://grafana.com/grafana/dashboards/8162

Let me know later

oijkn commented 4 years ago

Ok I'm going to download them but the json files present on the github are not the latest versions ?

Gracias por tu ayuda :)

Edit: ok the datastore view seems coherent although the screenshot of your github doesn't reflect reality which misled me. On the other hand the number of clusters still doesn't appear in the overview view even after importing the ID 8159.

jorgedlcruz commented 4 years ago

They are up to date now, let me know if everything works as expected.

oijkn commented 4 years ago

They are up to date now, let me know if everything works as expected.

Thanks for the update, however the VMware vSphere - Overview.json file is identical to the previous version. And the problem is still present, no value for cluster number (purple square).

jorgedlcruz commented 4 years ago

That is odd, I am using the next query to obtain the number of Clusters SELECT count(distinct("clustername")) AS "Cluster" FROM (SELECT "totalmhz_average", "clustername" FROM "vsphere_cluster_cpu" WHERE $timeFilter) I can clearly see you do not have the vsphere_cluster_cpu? Why is that? Have you given this a few minutes to monitor all? Strange

oijkn commented 4 years ago

Could you display the list of measurements ? As you can see, I don't have the measure "vsphere_cluster_cpu"

If the measurement is present in your side could you please check the last entry ?

[root@centos]:~ $ influx
Connected to http://localhost:8086 version 1.8.3
InfluxDB shell version: 1.8.3
> use telegraf
Using database telegraf
> SHOW MEASUREMENTS
name: measurements
name
----
vsphere_cluster_clusterServices
vsphere_cluster_mem
vsphere_cluster_vmop
vsphere_datacenter_vmop
vsphere_datastore_disk
vsphere_host_cpu
vsphere_host_datastore
vsphere_host_disk
vsphere_host_hbr
vsphere_host_mem
vsphere_host_net
vsphere_host_power
vsphere_host_rescpu
vsphere_host_storageAdapter
vsphere_host_storagePath
vsphere_host_sys
vsphere_host_vflashModule
vsphere_vm_cpu
vsphere_vm_datastore
vsphere_vm_disk
vsphere_vm_mem
vsphere_vm_net
vsphere_vm_power
vsphere_vm_rescpu
vsphere_vm_sys
vsphere_vm_virtualDisk
jorgedlcruz commented 4 years ago

Hello, I can see the cluster_cpu without any problem, yes image

My recommendation for your case will be to edit the query and select vsphere_cluster_mem as measurement and overhead_average as value, try it and let me know

oijkn commented 4 years ago

With the new query I have the value "Cluster Summary" which is displayed on the Dashboard, thank you

image

On the other hand, I don't understand why it works for you and not for me. Looking at the Telegraf github of the vSphere plugin, the referenced metrics don't show "vsphere_cluster_cpu" at all. So I don't understand how this is possible :\

https://github.com/influxdata/telegraf/blob/master/plugins/inputs/vsphere/METRICS.md#cluster-metrics

jorgedlcruz commented 4 years ago

That example might not be complete, they say on the Readme that they gather cpu consumption as well https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere and as you can see I have data, it is not a problem for me to change the dashboard to look at the mem, in fact the vcenter looks at that, so changing it now.

jorgedlcruz commented 4 years ago

Done, all changed in grafana.com and here, close it if you feel this is resolved.

oijkn commented 4 years ago

Done, all changed in grafana.com and here, close it if you feel this is resolved.

Yes for me all is good now, gracias quillo :)

dstewen commented 4 years ago

Similar issue here. vCenter 7 with 2 x Raspberry Pi in a HA cluster and 1 x NUC added to the datacenter as a ESXi host - not part of the HA. The vSphere Overview panel returns no data. grafana_overview

Influxdb, Grafana and Telegraf running in a docker stack.

oijkn commented 4 years ago

@dstewen I think this dashboard is not compatible with vsphere v7...

jorgedlcruz commented 4 years ago

Similar issue here. vCenter 7 with 2 x Raspberry Pi in a HA cluster and 1 x NUC added to the datacenter as a ESXi host - not part of the HA. The vSphere Overview panel returns no data. grafana_overview

Influxdb, Grafana and Telegraf running in a docker stack.

Which telegraf version you are running? You need to be running 1.15.3 or above, I am running 1.16 for example

jorgedlcruz commented 4 years ago

@dstewen I think this dashboard is not compatible with vsphere v7...

? The Dashboard is compatible with all VMware versions that Telegraf can ingest, which are all of them as long as the SDK is exposed, but if we focus on what VMware supports today, 6.0 and above. This telegraf, AND the Dashboard supports all. Not even that, I am running the latest 7.0 Update 1: image

So, again, if that is not there, either you are using not the latest telegraf version, or the telegraf is not collecting ALL the data due to buffer size, etc. telegraf logging should help. Let me know.

jorgedlcruz commented 3 years ago

Can we close this issue? I think it was an old version of telegraf.