exbane commented 8 years ago

Feature Request

Create Telegraf Collector for vSphere Object Metric Collection

Proposal:

There are already telegraf collectors for Windows,Linux,Unix systems to collect from a multitude of apps and systems but is currently limited in it's ability to collect from a vSphere API and ship the data to InfluxDB for proper graphing.

Current behavior:

Currently there are a few projects out there that can do this but not very well in my opinion.

The StatsFeeder fling is good at collecting all host and VM 20 second performance metrics but it's limited in how you can parse that data and currently has no ability to collect data against VMFS datastores for performance graphing. The project also has not been updated since 2013.

vSphere2Metrics is good for collecting those 5 minute intervals for your infrastructure and by far was the better one out of the stack but the time it takes to collect against an extremely large infrastructure with thousands of VMs is undesirable. This also has not been updated for quite some time.

SYNAXON/GraphiteReceiver - This works in conjunction with StatsFeeder to send the performance metrics to a graphite instance. It does work with sending the metrics to the graphite reciever on InfluxDB but for some reason the metric isn't getting parsed out properly and shows the metric as the entirety of the metric, name,metric,timestamp all in one. There isn't a good way of collecting against datastores and the project hasn't been updated in quite a while.

SexiGraf - This was a promising project and seems to work out well for the most part.. There are some holes in the product that makes it difficult for a larger shop to adopt such a thing.

Desired behavior:

A telegraf collector developed to gather whatever 20 second metrics you want from a vSphere infrastructure including Hosts, Datastores, VMs, Resource Pools, Clusters, Datacenters. Having the ability to parse the information for clusters is extremely desirable for creating cluster specific graphs in Grafana or Chronograf.

Use case: [Why is this important (helps with prioritizing requests)]

A lot of companies that put in the investment to purchase vSphere don't always have to budget to purchase the expensive monitoring tools from VMware such as vRealize Operations Manager. Having the ability to collect metrics and graph them through an Open Source system yourself and have that data be accurate would be a huge advantage for VM Admins. Just food for thought. Let me know if you have any questions.

-Adam

zp-markusp commented 8 years ago

+1 would be awesome to have transparency in here too...

brandonweeks commented 8 years ago

I've been working on extracting metrics via the vSphere Performance API with govmomi that would be compatable with Telegraf. Is everyone more interested in the "raw" data that is available from the ESXi hosts directly (like StatsFeeder) or more comprehensively collecting both the real time data and the various aggregate data vSphere collects (like vRealize)?

exbane commented 8 years ago

I think to start something like statsfeeder would be good but overall a comprehensive solution like vRealize would be great. I think something along those lines would be a great addition to companies that can't afford vROPs.

R-Sommer commented 8 years ago

I'd really appreciate to have a collector for telegraf as I still haven't found an satisfying solution for vSphere. Are there any plans and even a timeline?

sparrc commented 8 years ago

Nope, there are currently no plans or timeline, this plugin will likely need to come from the community for it to get done.

steverweber commented 8 years ago

I did post a script for this a few months ago.... need a place to put a collection of exec scripts...

https://github.com/uwaterloo-s8weber/influxdb-metrics-vmware you can run that and pass no uri arg and the exec script might fire the data off to influxdb.

note: telegraf might have issues with '\n\r' on windows so might need a patch for that first.

steverweber commented 8 years ago

and yap that script is slow... perhaps it could be threaded to give it a boost.

awilson77584 commented 7 years ago

Xorux has lpar2rrd (GNU GPL) that also does VMware monitoring. I'm trying to get the same type of information for lpars and the AIX frame. I'm trying to get the data into Telegraf for tagging and then to Influxdb. I'll post back if I have success.

Integrative commented 7 years ago

Would definitely be appreciated,we're now using SNMP to poll data from individual hosts, but that is just too time consuming and fragile to present reliable data. Was looking at pyvomi to build something like this, just haven't come to it yet

astolle commented 7 years ago

Hi,

I had the same problem and solved it with telegaf and the exec-input plugin. The plugin executes a small shell script, which uses govmomi to gather metrics from the vCenter. Works great within an Ubuntu container.

telegraf-input.conf

[[inputs.exec]]
commands = ["/usr/local/bin/cpu-metrics.sh /$PATH/host/$CLUSTER/*"]
timeout = "15s"
data_format = "influx"

cpu-metrics.sh
```
#!/bin/sh
```

use "govc ls" to find your path

PATH="$1"

metric.sample

- govmomi usage is documented here: https://github.com/vmware/govmomi/blob/master/govc/USAGE.md

- instance: -i=* will output avg util of all cores per core AND "-" as average for those

GOVC="/usr/local/bin/govc metric.sample -json=false -n=1 -instance=* -t=false $PATH cpu.utilization.average"

output format:

- output in fluxdata protocol: https://docs.influxdata.com/influxdb/v0.9/write_protocols/line/

$GOVC | /usr/bin/awk -F".example.net" '{print $1 " " $2}' | /usr/bin/awk '$2 ~ /-/ {print "esxi,host="$1" cputil="$4}'


* env vars needed for govc to work:

ENV GOVC_URL https://myvcenter.example.com ENV GOVC_USERNAME myUser ENV GOVC_PASSWORD mySecretPass

ENV GOVC_INSECURE true



Not perfect, but homebrewn. Maybe this helps someone.

Best, Alex

britcey commented 7 years ago

I ran across https://github.com/Oxalide/vsphere-influxdb-go, which looks to do what we want, albeit outside of Telegraf itself (I haven't tested yet) - might be worthwhile to convert to a Telegraf input plugin. Still a Go newbie, so that's currently beyond me.

sachinrase commented 7 years ago

This is really useful feature as the other monitoring software have this as part of their base install , IMHO telegraf can be universal agent for all cloud with addition of vcenter support.

Zabbix : https://www.zabbix.com/documentation/3.4/manual/vm_monitoring

Sensu :
https://github.com/sensu-plugins/sensu-plugins-feature-requests/issues/13 https://github.com/vmware/rbvmomi

Nagios: https://exchange.nagios.org/directory/Plugins/Operating-Systems/*-Virtual-Environments/VMWare/box293_check_vmware/details

Promethus : https://github.com/sapcc/vcenter-exporter/blob/master/vcenter-exporter.py

/cc @timhallinflux

mkuzmin commented 7 years ago

I found there is a PR for native vSphere plugin by @mlabouardy opened since April at #2682.

But the plugin is not complete yet. At the moment I keep my contributions at https://github.com/mkuzmin/telegraf/commits/vsphere

If anyone wants to try, here are binaries https://github.com/mkuzmin/telegraf/releases/

mkuzmin commented 7 years ago

I suppose my fork is pretty complete now. Telegraf now can collect metrics from hosts, filter objects by mask, handle errors. Fields are renamed in line to titles in user interface.

I'd like to have early feedback, especially about new field naming scheme. Please check out README and binaries.

exbane commented 7 years ago

I will check it out when I'm back from vacation On Wed, Jul 26, 2017 at 7:46 PM Michael Kuzmin notifications@github.com wrote:

I suppose my fork is pretty complete now. Telegraf now can collect metrics from hosts, filter objects by mask, handle errors. Fields are renamed in line to titles in user interface.

I'd like to have early feedback, especially about new field naming scheme. Please check out README https://github.com/mkuzmin/telegraf/blob/vsphere/plugins/inputs/vsphere/README.md and binaries https://github.com/mkuzmin/telegraf/releases.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/influxdata/telegraf/issues/1420#issuecomment-318215219, or mute the thread https://github.com/notifications/unsubscribe-auth/AL3urUoScQy8k15e-H3WYINGeQ84TXlzks5sR8_GgaJpZM4I-M7i .

MicKBfr commented 7 years ago

Hi,

That's a good start

Don't forget to collect IOPS and latency for disk and datastore and have tag by esx, datastore for each VM.

At this time i used https://github.com/Oxalide/vsphere-influxdb-go to collect IOPS per datastore and find which VM is responsible of high IOPS...

Thanks,

mjseid commented 7 years ago

@mkuzmin would be nice to include a tag for the cluster a resource is in. I believe it would be useful for most folks, since its more common to monitor for example memory usage of a cluster vs individual hosts to know when to add cluster capacity.