open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.87k stars 2.24k forks source link

Processor: Splitting summary metrics into timeseries. #7134

Open dashpole opened 2 years ago

dashpole commented 2 years ago

Is your feature request related to a problem? Please describe.

In the googlecloud exporter, we currently split a summary metric into multiple time series before sending it, since Google Cloud Monitoring does not support summaries:

An unfortunate consequence of this is that one summary data point can become a large number of timeseries. If an incoming batch of metrics has 200 data points, but contains a summary, it will exceed the 200 timeseries threshold for a batch we can send to cloud monitoring. As a result, we need to perform batching again after splitting summaries in the exporter to ensure we do not exceed 200 timeseries in a request to cloud monitoring.

Describe the solution you'd like

The base conclusion from our issue above is that we need to split summary metrics before we perform batching. One way to accomplish this would be to use a processor to split summary metrics, and insert it before the batching processor.

As it turns out, splitting summary metrics is very common in exporters other than googlecloud:

Most implementations split metrics by creating:

The implementations may have drifted too far apart, but it would be also be nice to bring the naming conventions for split metrics together.

We should consider adding a summaryadapter (feel free to suggest better names) that splits summary metrics into multiple parts, which can replace this functionality in exporters, and enable splitting metrics prior to batching them. It could have configuration for how to name the resulting metrics, to allow for backwards-compatibility with existing exporters. Existing exporters could then simply not support summary types in their exporter, and point users to the processor which does this.

Alternatives considered

While it does bring naming conventions for split summary metrics closer together, reduce the complexity of exporters, and solve google cloud's batching problem, it does make the configuration required for users somewhat more complex. If we think the extra complexity is not worth the benefits, then the google cloud exporter will just have to implement batching in the exporter.

cc @aabmass @jsuereth

buptubuntu commented 2 years ago

It seems the root cause is that the underlying timeseries database do not support summary and quatile metric friendly,so it is actually the exporter that should be responsible for splitting the metric, or maybe in the future the underlying timeseries storage support summary metric friendly , the exporter will just send the summary metric in one batch naturally.

For now,maybe we should first investigate the splitting rule in exporter, if the splitting rule is the same, we can create a processor with the same rule, otherwise it is quite confused for the user

aabmass commented 2 years ago

I think this proposal would also solve the batching issue https://github.com/open-telemetry/opentelemetry-collector/issues/4646. See point 4:

Allow customization of the "size" function. There were requests to support batching by the serialized size, we can offer this in the exporter helper much nicer since the custom sizer can actually work using the exporter wire format. So batching can happen after data serialization.

CatherineF-dev commented 2 years ago

Agree on summaryadapter proposal.

A CUJ is to rename Summary metrics:

It's hard to add a metrics_rename processor after googlecloud exporter.


Currently, googlecloud exporter shoulders two responsibilities:

  1. send data to googlecloud backend
  2. process Summary metrics (split it into three time series)

It will have more flexibilities on Summary metrics if #2 is splitted from googlecloud exporter into a processor.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 1 year ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

mx-psi commented 7 months ago

We have some support for splitting a summary into timeseries on the transform processor (see summary here https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/30156#issuecomment-1881502313 ). Would adding a function for quantiles to the transform processor solve this issue?