logstash-plugins / logstash-output-datadog_metrics

Apache License 2.0
3 stars 18 forks source link

Plugin does not aggregate events if there is more than one per second #6

Open Ricky-Thomas opened 8 years ago

Ricky-Thomas commented 8 years ago

Hey guys, engineer for Datadog here.

I've been working on a case with a customer who's using this plugin and they've hit an interesting issue.

In some cases, the plugin can potentially send more than one point per timestamp, which isn't supported in DD (needs client side aggregation).

This is where this can occur: https://github.com/logstash-plugins/logstash-output-datadog_metrics/blob/master/lib/logstash/outputs/datadog_metrics.rb#L75

For example, if 3 points are sent with a timestamp of 5, the plugin will send us a list of points like: [(5,1), (5,1), (5, 1)]

On our side, however, we don't know which point to consider, and this is causing some issues.

The solution here would be to aggregate on the client side (i.e: sum) all similar events (same tags) to produce at max one point per second (i.e: If you have 3 events happenning at ts=5, you could send a point (5,3) with the correct tags).

Ricky-Thomas commented 8 years ago

Any updates regarding this?

talevy commented 8 years ago

I am not familiar with Datadog here, but what if the plugin sends you two separate requests, each with the same timestamp.

so, one request with [(5,1)] and another with [(5,2)]. does Datadog support this?

kushmansingh commented 7 years ago

Bump, this issue leads to incredibly inaccurate counts on the Datadog side.

mrjcleaver commented 7 years ago

Bump - yes, problematic.

mrjcleaver commented 7 years ago

I note https://github.com/fasterize/logstash-output-datadog_metrics/commit/8c4bd2ad0ca4fab91e8f8fd3048c8a6e9e35c7cd

jordansissel commented 7 years ago

needs client side aggregation

Logstash is often run in a decentralized manner, so there is no way for Logstash instances to collaborate on a shared idea of a "aggregate value" across any given stream, so depending on what you are counting, it may not even matter if a single Logstash can count correctly.

There's also a problem that the real-time clock is often not the source of an event's timestamp in Logstash. Events can occur out of order, or be delayed significantly.

Maybe the best way to do this is by putting things in Elasticsearch and doing a periodic Elasticsearch aggregation query to do the counting, and then pump that result in to Datadog?