Closed povilasv closed 1 year ago
The solution to this problem could be adding special tag to the metrics, for example when delete_counters=true
, we could add temporality=delta
, when false, temporality=cumulative
, similiar to metric_type=counter
. Then add some autodetection logic in the library.
WDYT @jacobmarble ?
I think this is a good idea. As I understand (please correct me):
The OpenTelemetry metrics data model for Sum, Histogram, and ExponentialHistogram requires an "aggregation temporality" attribute.
The Telegraf statsd input plugin more-or-less coerces the statsd data model to the Prometheus data model. Both statsd and Prometheus don't signal "aggregation temporality", but statsd is compatible with "delta" and Prometheus is compatible with "cumulative".
The Telegraf statsd input plugin has option delete_counters
to signal whether Telegraf should accumulate (making compatible with Prometheus), or pass through, sum and histogram values.
The influx2otel
metrics conversion doesn't have a way to detect aggregation temporality, in fact, it is hard coded to signal cumulative, because it assumes Prometheus data model.
Certain line protocol inputs, such as statsd-via-Telegraf, are capable of generating "delta aggregation temporality", so we should add support somehow, without breaking too many things.
How to fix?
otel.aggregation_temporality
, with values delta
and cumulative
.
delete_counters = false
, correct?) then we should add the start time to the line protocol data model as field otel.start_time_unix_nano
. In a related issue, I indicated that a metric start time should correlate with the time the accumulation began, which is possible in this case, because the statsd plugin is accumulating.For context, otel2influx
drops delta metrics in the Prometheus output schemas. That could be similarly adjusted, separately.
@povilasv please share your thoughts on the above.
Hey @jacobmarble sorry for late response. Everything you wrote makes sense and is correct.
I have a proof of concept that works if you build telegraf with these two PRs:
Could you take a look at them?
Currently I add start_time
as the tag, but maybe instead we should change the core Metric type?
Roughly I am thinking Accumuluators allow you to pass multiple time.Time, example:
func (ac *accumulator) AddGauge(
measurement string,
fields map[string]interface{},
tags map[string]string,
t ...time.Time,
)
So we can do AddGauge(..., metricTime, startTime)
and then change metric types and interfaces to support SetStartTime(), etc . Somewhere here: https://github.com/influxdata/telegraf/blob/master/metric.go#L107
@powersj / @jacobmarble / @srebhan what you all think?
I think updating core models might be hard as lots of downstream dependencies, databases storing stuff, etc. So let's align what is the best approach here and what should we do?
I think updating core models might be hard as lots of downstream dependencies, databases storing stuff, etc.
agreed
Currently I add start_time as the tag, but maybe instead we should change the core Metric type?
maybe I missed a point, but why would you not use the tag?
Because it might cause issues in downstream systems. For example if you use prometheus output with statsd input, we will start generating these kinds of metrics:
counter{metric_type="counter",namespace="$NAMESPACE",nodename="$NODENAME",start_time="2023-04-20T11:27:00.000797804+03:00",temporality="delta",type="app"} 10
counter{metric_type="counter",namespace="$NAMESPACE",nodename="$NODENAME",start_time="2023-04-20T11:27:10.000731998+03:00",temporality="delta",type="app"} 10
And this creates a huge cardinality issue in Prometheus which will eventually make Prometheus OOM. More information: https://stackoverflow.com/questions/46373442/how-dangerous-are-high-cardinality-labels-in-prometheus
So what we basically want is somehow pass the start_time, but make it in a way that's invisible to the output plugins that don't need that.
Another note: the current Prometheus output is currently wrong when using with statsd, as it doesnt convert delta to commulative. So maybe folks dont really use this in practice?
Also know you can filter the tags you don't want on an output config.
And this creates a huge cardinality issue in Prometheus which will eventually make Prometheus OOM.
I did not realize this was an issue for Prometheus similar to influx
So what we basically want is somehow pass the start_time, but make it in a way that's invisible to the output plugins that don't need that.
Instead of at output plugins, is this something that could be handled at the parser level? If the parser see's the tag that Jacob suggested it goes looking for the corresponding start date field.
Instead of at output plugins, is this something that could be handled at the parser level? If the parser see's the tag that Jacob suggested it goes looking for the corresponding start date field.
I don't think I follow what you mean. Could you elaborate? Or point to relevant code?
My POC currently computes start time counting the calls between Gather calls - https://github.com/influxdata/telegraf/pull/13087/files#diff-bacab159d6514066d37326f61ab09f62bbe376e1421b9e37b8daacd6df14807aR160
Step 1: get the data - as @jacobmarble mentioned let's add a config option to the statsd option that will set a field (otel.start_time_unix_nano
) and tag (otel.aggregation_temporality
). This is a modification to your PR in #13087.
This gives us an opt-in method of adding this data to avoid breaking existing users and provides us a way to collect it and know if we have it.
Step 2: Now that you have the data, update the opentelementy output and/or the prometehus serializer with a config option that when set, looks at a specific tag to set the start time.
I think this is what @jacobmarble was suggesting in his previous comment.
Thanks for explaining @powersj :bow: . I've opened PR for the 1st part https://github.com/influxdata/telegraf/pull/13087 tested together with https://github.com/influxdata/influxdb-observability/pull/160 and it seems to do work :)
Status update, two PRs got merged https://github.com/influxdata/telegraf/pull/13087 and https://github.com/influxdata/influxdb-observability/pull/160
now we need:
Ok atm we cannot update telegraf to use new libraries because OTEL go sdk v1.15.1 doesn't build for bsd systems. See https://app.circleci.com/pipelines/github/influxdata/telegraf/16298/workflows/01fa2768-354f-407f-8222-8c79914fef32/jobs/254646
The upstream bug was fixed by this https://github.com/open-telemetry/opentelemetry-go/pull/4077 PR.
We need to wait untill new version of otel go sdk be released then update the dependency in otel2influx library and then update telegraf.
@povilasv it looks like the BSD fix was released a couple days ago in v1.16.0/v0.39.0. What needs to happen next?
Thanks for the ping.
Working on the PR to otel2influx library -> https://github.com/influxdata/influxdb-observability/pull/244
Relevant telegraf.conf
System info
1.25.0
Docker
No response
Steps to reproduce
Expected behavior
When
delete_counters= true
, the OpenTelemetry Counter should be sent with Delta Temporality. As Telegraf accumulates data for every interval and then resets counters. Example we send200
every interval, we see telegraf outputs following:T0: 200 T1: 200 T2: 200
Which is Delta Temporality, more info -> https://opentelemetry.io/docs/reference/specification/metrics/supplementary-guidelines/#aggregation-temporality
When
delete_counters= false
, the OpenTelemetry Counter should be sent with Cumulative Temporality. As Telegraf accumulates the data and reports cumulative counter. For example:T0: 200 T1: 400 T2: 600
Actual behavior
We always send Temporality as Cumulative in this code:
https://github.com/influxdata/influxdb-observability/blob/28e34bc69b46ddaab673aa784e67e467f08902e8/influx2otel/metrics.go#L158-L161
Additional info
No response