Handling event data from sources like vSphere

jplouis commented 4 years ago

Feature Request

Hello, has there been any discussions on handling event data in Telegraf? For example being able to subscribe to vSphere events and send them via an output. The metric type could currently be used but a single event tends to have multiple values with hight cardinality such as messages and/or event ids. Also the type of processors and aggregators could be different for event data.

Proposal:

Extending the metric type to encompass event data or a new event type. Either would require consideration for how processors and aggregators work with event data if at all.

A TimeSeries interface could extract the common methods of metrics and events. The Metric interface would embed the TimeSeries interface and a new Event interface would also embed the TimeSeries interface. New methods could be added that signify if the processor, aggregator and output plugins can handle generic TimeSeries or Event data. TimeSeries data would then flow through the same channels and accumulators and it is up to the plugins to opt into handling the different type of time series data.

Additionally add something similar to internal stats that allows plugins to report event data related to monitored sources such as credential and host resolution problems.

Current behavior:

Desired behavior:

Use case:

Being able to collect event data from monitored systems.
As mentioned vSphere monitoring is a good fit for this.
SNMP traps as events
Could be used for parsing logs, structured or otherwise, to be sent as events.
Plugin or agent issues such as expired credentials, overloaded/under performing plugins such as not finishing within interval could be reported.

danielnelson commented 4 years ago

I don't think there is any reason this could not be done, I see this as being a new input plugin specifically for events.

fyi @prydin @puckpuck

For SNMP traps check out the snmp_trap input plugin.

puckpuck commented 4 years ago

This would warrant a larger conversation, since output plugins would all need to have support to consume event data as well. I'm not clear on what open source monitoring platforms support both metrics + events. I know many commercial vendors support both metrics + events.

danielnelson commented 4 years ago

For Telegraf, we would want an input plugin to convert the events into telegraf.Metric. Processors, Aggregators, Outputs wouldn't be event aware themselves.

jplouis commented 4 years ago

I agree, the conversation isn't about vSphere events specifically, it is about a distinction between metric and event timeseries data. Vsphere was an example since it has a rich event system. I have seen how the snmp trap input works with the fields. In general the fields for a metric can represent distinct time series, whereas for an event all fields can be part of the same time series event. The problem comes on the output side in determining if a metric represents and event or metric time series, looking at a field to see if it is a string type may not be enough.

I have noticed the Riemann plugin looks at string types and optionally marks them as events. Not sure if other plugins do the same. I also believe Graphite can store events.

danielnelson commented 4 years ago

Telegraf doesn't draw a distinction between events and metrics, perhaps it would help if you describe a concrete example of where it would be helpful. In the case of Reimann the current method seems to work fine.

jplouis commented 4 years ago

Many monitoring backends treat irregular time series, events, separately and differently than metric time series and as mentioned many don't deal with events at all. Graphite, Wavefront and Zenoss, what I work on, handle events separately from metrics.

So I what I was looking for was someway to distinguish the event like metrics so that the data can be sent to the appropriate API. I'm also looking to see if writing Telegraf inputs for things like vSphere events is a viable approach to collecting that data. An agent like Telegraf is also a great way to generate events about the systems being monitored as it can tell you when systems are no longer reachable, for example due to network or credential reasons.

In Telegraf the syslog and snmp trap inputs generate what can be considered event data, events or irregular time series have a lot text or identifier values. The syslog and trap inputs put the values into the metric fields and the metric is untyped. The difficulties in dealing with this kind of data comes in the outputs, the Influx output looks like it sends the Telegraf metric as is. Most other outputs I've looked at send the fields from the metric as individual time series. Sending each field as time series is reasonable in metric cases since the numeric data can make sense on its own regardless of the other field values. For metrics that are more event like, e.g. traps and syslog, the individual fields don't work very well on their own since they tend to be text based or identifiers that are meant to be consumed as a whole. Having inputs identify these type of event metrics would be helpful on the output side.

After looking at the Telegraf code some more I think what I suggested with different event and metric interfaces is too intrusive. Maybe a simpler thing like a new Metric ValueType would suffice.

reimda commented 2 years ago

It looks like the go module we're using does support receiving events. There's an example in the source tree here: https://github.com/vmware/govmomi/blob/v0.29.0/examples/events/main.go

Is anyone interested in implementing this in telegraf?

telegraf-tiger[bot] commented 5 months ago

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

influxdata / telegraf