Open gabrielgiussi opened 2 years ago
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
If this were implemented, I think it should just be aggregate("min/max/sum/mean") where name == "metric_name"
(This would be run on the metric context.)
Common uses of aggregation would include both with an explicit label list (sum by (labels) (metric)
and with a label exclusion list (sum without (labels) (metric)
.
It would be simpler to leave attribute/label removal to other statements, and have aggregate
just merge together matching data points
This would be run on the metric context
if possible, I would like to look at this ticket cc @evan-bradley
Just for final sumarization, the aggregate()
function will have a single input parameter defining the aggregation function (one of type min/max/mean/sum/count
) without any additional options (these would be handled in separate statements). The aggregate
function will be a metric-only function (since for traces and logs it does not makes sense). Anything else that I am missing or somebody would like to add?
Thank you!
@odubajDT That all sounds good to me.
I think it's fair to suggest that users run keep_keys
or delete_matching_keys
to remove undesired attributes in data points before aggregating them since attributes on datapoints are considered extra dimensions and we're effectively reducing dimensions here.
@odubajDT That all sounds good to me.
I think it's fair to suggest that users run
keep_keys
ordelete_matching_keys
to remove undesired attributes in data points before aggregating them since attributes on datapoints are considered extra dimensions and we're effectively reducing dimensions here.
Thanks for the response! Do I understand correctly that these functions should be ran in the statements before the actual aggregate()
statement? Basically, it should not be part of aggregate()
implementation and it's just a suggestion.
Correct me please if my assumption is wrong.
Another thoughts for clarification:
The current solution in metricstransform processor
provides a possibility to aggregate all labels except a certain label
# aggregate away all labels except `state` using summation
include: system.cpu.usage
action: update
operations:
- action: aggregate_labels
label_set: [ state ]
aggregation_type: sum
and additionally also provides a way to aggregate data points with a certain label values into a new label, like the following
# aggregate data points with state label value slab_reclaimable & slab_unreclaimable using summation into slab
include: system.memory.usage
action: update
operations:
- action: aggregate_label_values
label: state
aggregated_values: [ slab_reclaimable, slab_unreclaimable ]
new_value: slab
aggregation_type: sum
Let's now think about a solution which covers these two cases. The first use-case is quite clear and may (from user perspective) look like the following:
aggregate(sum, [attributes["state"]]) where metric.name == "xyz"
Where the first attribute represents the aggregation type (min/max/mean/sum) and the second represents the label key(s) for aggregation.
The second use-case might look like the following:
aggregate(sum, [attributes["state"]], ["slab_reclaimable", "slab_unreclaimable"]) where metric.name == "xyz"
Where the first attribute represents the aggregation type (min/max/mean/sum), the second represents the label key for aggregation and the third one the values of the label, which will be aggregated.
The open questions are the following:
Are mine assumptions about the functionality and also the format of the aggregate()
functions correct?
Is it a valid input to have an aggregation without using the labels? For example
aggregate(sum) where name === "xyz"
Will therefore parameters 2 and 3 be optional?
metricstransform processor
be supported in transform processor
? That means: gauge, int, double, histogram, exponential histogram?Thank very much for the answers!
cc @evan-bradley
Do I understand correctly that these functions should be ran in the statements before the actual aggregate() statement? Basically, it should not be part of aggregate() implementation and it's just a suggestion.
That's right, if we update the attributes before running the aggregate
function, we would run those on separate statements, then would run the aggregate
function once the datapoints have been prepared.
Based on what I've seen from the equivalent functionality in the metrics transform processor, I see two paths for implementation:
aggregate
function with optional parameters.Unless approach 1 significantly improves usability, I think it's okay to rely on approach 2.
Is it a valid input to have an aggregation without using the labels? For example
I would assume that an aggregation that doesn't specify labels indicates that all data points should be aggregated and only matching attributes are retained. I would fall back on the behavior the metrics transform processor exhibits here as a default and we can deviate from that if we feel it doesn't make sense.
Will all datatypes of the current solution in metricstransform processor be supported in transform processor ? That means: gauge, int, double, histogram, exponential histogram?
I think this should work, yes. If these types are all in the metrics transform processor functionality, I think they would work here.
Component(s)
processor/transform
Is your feature request related to a problem? Please describe.
The metricstransform already provides this but there is no way to do it in the transform processor. e.g.
With this we can go from
to this
Describe the solution you'd like
A function to aggregate metric values grouping by a set of attributes
In the example I'm also renaming the metric but that could be done with a different function in a next step using
set(metric.name,"jvm_memory_bytes_used")
(Providing this just as an example, I don't know which could be the best syntax for this)
Describe alternatives you've considered
No response
Additional context
No response