influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.57k stars 5.56k forks source link

Vsphere plugin not finding all the VMs (not finding Tanzu VMs) #13000

Closed rafapiltrafa closed 1 year ago

rafapiltrafa commented 1 year ago

Relevant telegraf.conf

[[inputs.vsphere]]
## List of vCenter URLs to be monitored. These three lines must be uncommented
## and edited for the plugin to work.
interval = "60s"
#  vcenters = [ "https://172.*.*.*/sdk"]
  username = "*****"
  password = "*****"

vm_metric_include = []
host_metric_include = []
cluster_metric_exclude = ["*"]
datastore_metric_exclude = ["*"]
discover_concurrency = 3
max_query_metrics = 256
max_query_objects = 512
timeout = "30s"
insecure_skip_verify = true

Logs from Telegraf

2023-03-31T15:30:38+02:00 D! [inputs.vsphere] Discover new objects for 172.29.*.*
2023-03-31T15:30:38+02:00 D! [inputs.vsphere] Discovering resources for datacenter
2023-03-31T15:30:38+02:00 D! [inputs.vsphere] Find(Datacenter, /*) returned 1 objects
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Found 20 metrics for MSM2 (Maqueta)
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Discovering resources for cluster
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Find(ClusterComputeResource, /*/host/**) returned 4 objects
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Discovering resources for resourcepool
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Find(ResourcePool, /*/host/**) returned 4 objects
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Found 4 metrics for Resources
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Found 4 metrics for Resources
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Found 4 metrics for Resources
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Found 4 metrics for Resources
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Discovering resources for host
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Find(HostSystem, /*/host/**) returned 11 objects
.....
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Discovering resources for vm
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Find(VirtualMachine, /*/vm/**) returned 207 objects  <<<<< THIS IS the problem. There are 280 VMS  <<<<<<<<<
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Find(ResourcePool, /*/host/**) returned 4 objects
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Found 149 metrics for vm_prueba
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Found 169 metrics for mmafmsm2-02 - ASFE 2 
2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Found 160 metrics for mmesmsm2-01 (Elastic Search, kibana)

System info

Telegraf 1.25.3 (git: HEAD@3835522c) on CentOS Linux release 7.9.2009

Docker

No response

Steps to reproduce

1. 2. 3. ...

Expected behavior

I would expect the plugin to discover all the VMS in the Vcenter. If we check the UI or with RESTAPI we see all the VMs

curl -k -X GET -H "vmware-api-session-id: $ID" https://172.*.*.*/rest/vcenter/vm |jq | grep name | wc -l 280

But the plugin does not see nor query the VMS that belong to the Tanzu clusters.

image

image

Actual behavior

Only the initially discovered VMs (207) are queried by the plugin; we cannot obtain metrics of the Tanzu-VMs

Thank you very much for your help ! Best Regards.

Additional info

No response

powersj commented 1 year ago

2023-03-31T15:30:39+02:00 D! [inputs.vsphere] Find(VirtualMachine, /*/vm/**) returned 207 objects <<<<< THIS IS the problem. There are 280 VMS <<<<<<<<<

Are they all in the same location? or in different places?

we cannot obtain metrics of the Tanzu-VMs does not see nor query the VMS that belong to the Tanzu clusters.

Are you saying only VMs related to Tanzu are missing?

what is your vsphere version?

rafapiltrafa commented 1 year ago

Hi Powersj,

They are all in the same location. And yes, the VMs that are missing are the ones that are inside the Namespaces (Tanzu ones). Vpshere version is 7.0.3. Thank you very much !

rafapiltrafa commented 1 year ago

The issue is that the Find(VirtualMachine, /*/vm/**) returns 207 objects, but when querying the VCenter API it correctly lists all the VMs:

curl -k -X GET -H "vmware-api-session-id: $ID" https://172...*/rest/vcenter/vm |jq | grep name | wc -l 280

Perhaps we should apply a special vm_include path in order the plugin can discover the Tanzu VMs. I have made some tests but with no good results,

Thanks !! Regards

powersj commented 1 year ago

@prydin are you aware of the differences when trying to monitor Tanzu VMs?

rafapiltrafa commented 1 year ago

Hi ! I have made some tests in other system. The problem seems to happen with the VMs that form part of a Vsphere with Tanzu cluster.

With TKG (Tanzu Kubernetes Grid) clusters all VMs are correctly discovered by the vpshere telegraf plugin.

Best Regard, rafa

prydin commented 1 year ago

@rafapiltrafa What power states are the missing VMs in? I seem to recall that there's some kind of zombie state you can put VMs in which may be used by Tanzu.

prydin commented 1 year ago

Also, are the Tanzu VMs in a non-standard folder? In that case, change vm_include to /**/vm/**

rafapiltrafa commented 1 year ago

@prydin : Hi Pontus , sorry about my late response. I've been out for some days.

The power state seems to be powered on:

image

The Tanzu VMs are inside the Namespaces folder and then inside different Resource Pools.

I have made some tests with the path but with same results:

// default VM_include telegraf.log:2023-04-18T23:32:55+02:00 D! [inputs.vsphere] Find(VirtualMachine, /*/vm/**) returned 210 objects

// With double * as suggested telegraf.log:2023-04-18T23:51:39+02:00 D! [inputs.vsphere] Find(VirtualMachine, //vm/) returned 210 objects

The only difference I see is the guest OS that is in all the missing VMs (VMware Photon OS (64-bit)). Perhaps in the VSphere with Tanzu deployments the plugin cannot query this kind of VMs.

Thank you very much for your help ! Best Regards, rafa

prydin commented 1 year ago

I know Tanzu creates some kind of special flavor of VMs, but I always thought they would show up in the normal API calls. I guess I'll have to test this. Unfortunately, it may take me some time to get my hands on a Tanzu lab environment. Let me start working on that.

prydin commented 1 year ago

@rafapiltrafa Which specific Tanzu product are you having issues with? Version?

rafapiltrafa commented 1 year ago

@prydin Hi Pontus, Thank you very much. The problem is with the "VSphere with Tanzu" clusters. With TKG clusters the VMs are correctly seen.

Our version of VMWare is 7.0.3.01200

OS VMware Photon OS/Linux v1.22.9+vmware.1

Thank you very much ! Best Regards, rafa

rafapiltrafa commented 1 year ago

@prydin Hi Pontus ! have you been able to make some tests about this ?

Thank you very much ! Best Regards, rafa

jaymzmac commented 1 year ago

This is vCenter permissions related issue and can be solved by adding the user to the "ServiceProviderUsers" SSO group. You will then be able to see all the TKGs VMs. https://williamlam.com/2021/07/quick-tip-vsphere-permission-to-view-vsphere-with-tanzu-namespaces.html

powersj commented 1 year ago

@rafapiltrafa can you try the above or confirm if you have those permissions?

rafapiltrafa commented 1 year ago

@powersj @jaymzmac : It works that way. Thank you very much for your help !!!

rafapiltrafa commented 1 year ago

PROBLEM: VSphere Telegraf Plugin not obtaining metrics from Tanzu VMs SOLUTION: Adding the user to the "ServiceProviderUsers" SSO group

Thanks a lot to @jaymzmac for the solution to this issue.