Closed cep21 closed 4 years ago
@cep21 thanks for opening this issue. There are two distinct requirements here.
First requirement is strictly how SDK implements it. While the second would require changes to metrics proto to accommodate min/max.
- Fixed (but configurable) Reporting Interval.
It's a bit more complicated than that. A fixed reporting interval is something that will kind of work but not strictly work. To strictly support CloudWatch with as much accuracy as possible, I would need to ensure a datapoint received at 59.999 goes to window 0 and a datapoint received at 60.0001 goes to window 1. This is possible with APIs similar to segment's, but use of open telemetry would sacrifice strict correctness, which is a difficult ask for people that are looking for a metrics abstraction: comparing "good enough" metrics with "strictly correct" metrics.
We need to aggregate by when the metrics come in, not when the code happens to report them.
two distinct requirements
There's another part about preventing pre aggregation with types like metricdata.Summary
. I cannot provide a good user experience for this. With an API like segment's I can ignore metricdata.Summary
by aggregating on ingestion.
Thanks for clarifying. It is 'Fixed Aggregation Interval' rather than 'Fixed Reporting Interval'.
Regarding Summary - can you clarify what do you mean by 'ignore metricdata.Summary by aggregating on ingestion.' ?
can you clarify what do you mean by 'ignore metricdata.Summary by aggregating on ingestion
metricdata.Summary
is a lossy aggregation for CloudWatch. A CloudWatch user would never want to aggregate like that. They would prefer aggregating into a distribution so real percentiles can be calculated on the fly. One way to do this is to simply not allow data to ever get into a metricdata.Summary
type and to aggregate it on ingestion into a distribution.
I wrote in Gitter:
The point you raise in issue 83 is very relevant to the SDK-side of the OTel project, which like OC is tailored to the Prometheus model. Other metrics exporters, such as statsd and as you point out, CloudWatch, cannot easily or faithfully use the currently specified default exporter for metrics. We discussed this in the meeting, how some vendors would also like to interpret metrics API calls as events, to be recorded as raw data. We believe the API we're specifying won't limit these implementations, but they'll require different SDK behavior, i.e., cannot be done as exporters. An open question is whether the first release of open telemetry libraries should aim to allow exporting metrics as events (like statsd), along strict time boundaries (like CloudWatch), in addition to the current exporter specification (along variable time boundaries, like Prometheus).
open question is whether the first release of open telemetry libraries should aim to allow
I feel strongly that the first version should be similar to segment's API, but using terms from open telemetry https://github.com/segmentio/stats/blob/v4.4.1/handler.go#L7. I'll paste it below
type Handler interface{
HandleMeasures(time time.Time, measures ...Measure)
}
People then implement exporters that implement that API. That solves all implementations.
Is it possible to get a conversation around why that is a bad proposal?
stale issue.
Hi,
I tried writing a cloudwatch exporter for open census and the API there is not rich enough to do CloudWatch metrics well. I document some of the reasons on the README.md at https://github.com/cep21/cwopencensusexporter
I see open telemetry is based upon open census. Will it have the same problems, or will it work well with CloudWatch?
I've looked over a few metrics APIs and it seems like segment.io's handler is expressive enough for cloudwatch metrics because it lets me handle measures higher up the API stack at https://github.com/segmentio/stats/blob/master/handler.go#L15. Will open telemetry allow the same?
Thanks!