open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.87k stars 2.24k forks source link

[processor/transform] Aggregate metrics grouping by label #16224

Open gabrielgiussi opened 1 year ago

gabrielgiussi commented 1 year ago

Component(s)

processor/transform

Is your feature request related to a problem? Please describe.

The metricstransform already provides this but there is no way to do it in the transform processor. e.g.

- include: process.runtime.jvm.memory.usage
        action: insert
        new_name: jvm_memory_bytes_used
        operations:
          - action: aggregate_labels
            label_set: [ type ]
            aggregation_type: sum

With this we can go from

process_runtime_jvm_memory_usage{job="unknown_service:java",pool="CodeHeap 'non-nmethods'",type="non_heap"} 1.382272e+06
process_runtime_jvm_memory_usage{job="unknown_service:java",pool="CodeHeap 'non-profiled nmethods'",type="non_heap"} 9.991936e+06
process_runtime_jvm_memory_usage{job="unknown_service:java",pool="CodeHeap 'profiled nmethods'",type="non_heap"} 2.968128e+07
process_runtime_jvm_memory_usage{job="unknown_service:java",pool="Compressed Class Space",type="non_heap"} 2.2830672e+07
process_runtime_jvm_memory_usage{job="unknown_service:java",pool="Eden Space",type="heap"} 1.5339216e+07
process_runtime_jvm_memory_usage{job="unknown_service:java",pool="Metaspace",type="non_heap"} 1.40415936e+08
process_runtime_jvm_memory_usage{job="unknown_service:java",pool="Survivor Space",type="heap"} 330760
process_runtime_jvm_memory_usage{job="unknown_service:java",pool="Tenured Gen",type="heap"} 6.3913088e+07

to this

jvm_memory_bytes_used{type="heap",job="unknown_service:java"} 7.9583064e+07
jvm_memory_bytes_used{type="non_heap",job="unknown_service:java"} 2.04302096e+08

Describe the solution you'd like

A function to aggregate metric values grouping by a set of attributes

- aggregate([attributes["type"]],sum) where metric.name == "process_runtime_jvm_memory_usage"

In the example I'm also renaming the metric but that could be done with a different function in a next step using set(metric.name,"jvm_memory_bytes_used")

(Providing this just as an example, I don't know which could be the best syntax for this)

Describe alternatives you've considered

No response

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

seankhliao commented 1 year ago

If this were implemented, I think it should just be aggregate("min/max/sum/mean") where name == "metric_name" (This would be run on the metric context.) Common uses of aggregation would include both with an explicit label list (sum by (labels) (metric) and with a label exclusion list (sum without (labels) (metric). It would be simpler to leave attribute/label removal to other statements, and have aggregate just merge together matching data points This would be run on the metric context

odubajDT commented 3 months ago

if possible, I would like to look at this ticket cc @evan-bradley

odubajDT commented 3 months ago

Just for final sumarization, the aggregate() function will have a single input parameter defining the aggregation function (one of type min/max/mean/sum/count) without any additional options (these would be handled in separate statements). The aggregate function will be a metric-only function (since for traces and logs it does not makes sense). Anything else that I am missing or somebody would like to add?

Thank you!

evan-bradley commented 3 months ago

@odubajDT That all sounds good to me.

I think it's fair to suggest that users run keep_keys or delete_matching_keys to remove undesired attributes in data points before aggregating them since attributes on datapoints are considered extra dimensions and we're effectively reducing dimensions here.

odubajDT commented 2 months ago

@odubajDT That all sounds good to me.

I think it's fair to suggest that users run keep_keys or delete_matching_keys to remove undesired attributes in data points before aggregating them since attributes on datapoints are considered extra dimensions and we're effectively reducing dimensions here.

Thanks for the response! Do I understand correctly that these functions should be ran in the statements before the actual aggregate() statement? Basically, it should not be part of aggregate() implementation and it's just a suggestion.

Correct me please if my assumption is wrong.

odubajDT commented 2 months ago

Another thoughts for clarification:

The current solution in metricstransform processor provides a possibility to aggregate all labels except a certain label

# aggregate away all labels except `state` using summation
include: system.cpu.usage
action: update
operations:
  - action: aggregate_labels
    label_set: [ state ]
    aggregation_type: sum

and additionally also provides a way to aggregate data points with a certain label values into a new label, like the following

# aggregate data points with state label value slab_reclaimable & slab_unreclaimable using summation into slab
include: system.memory.usage
action: update
operations:
  - action: aggregate_label_values
    label: state
    aggregated_values: [ slab_reclaimable, slab_unreclaimable ]
    new_value: slab 
    aggregation_type: sum

Let's now think about a solution which covers these two cases. The first use-case is quite clear and may (from user perspective) look like the following:

aggregate(sum, [attributes["state"]]) where metric.name == "xyz"

Where the first attribute represents the aggregation type (min/max/mean/sum) and the second represents the label key(s) for aggregation.

The second use-case might look like the following:

aggregate(sum, [attributes["state"]], ["slab_reclaimable", "slab_unreclaimable"]) where metric.name == "xyz"

Where the first attribute represents the aggregation type (min/max/mean/sum), the second represents the label key for aggregation and the third one the values of the label, which will be aggregated.

The open questions are the following:

  1. Are mine assumptions about the functionality and also the format of the aggregate() functions correct?

  2. Is it a valid input to have an aggregation without using the labels? For example

aggregate(sum) where name === "xyz"

Will therefore parameters 2 and 3 be optional?

  1. Will all datatypes of the current solution in metricstransform processor be supported in transform processor ? That means: gauge, int, double, histogram, exponential histogram?

Thank very much for the answers!

cc @evan-bradley

evan-bradley commented 2 months ago

Do I understand correctly that these functions should be ran in the statements before the actual aggregate() statement? Basically, it should not be part of aggregate() implementation and it's just a suggestion.

That's right, if we update the attributes before running the aggregate function, we would run those on separate statements, then would run the aggregate function once the datapoints have been prepared.

Based on what I've seen from the equivalent functionality in the metrics transform processor, I see two paths for implementation:

  1. We implement the functionality the metrics transform processor has inside the aggregate function with optional parameters.
  2. Just leave the functionality out like we've discussed so far.

Unless approach 1 significantly improves usability, I think it's okay to rely on approach 2.

Is it a valid input to have an aggregation without using the labels? For example

I would assume that an aggregation that doesn't specify labels indicates that all data points should be aggregated and only matching attributes are retained. I would fall back on the behavior the metrics transform processor exhibits here as a default and we can deviate from that if we feel it doesn't make sense.

Will all datatypes of the current solution in metricstransform processor be supported in transform processor ? That means: gauge, int, double, histogram, exponential histogram?

I think this should work, yes. If these types are all in the metrics transform processor functionality, I think they would work here.