Open djluck opened 9 months ago
OK I'm game. I'll sponsor this.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
What needs to happen next with this? Does @djluck have a prototype that can be used as a starting point?
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Hey, I'm unfortunately very short on time over the next month. Hopefully I'll be able to clean up and contribute the prototype next in September.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
The purpose and use-cases of the new component
A known restriction in the metricstransformprocessor is that aggregation of labels happens within a batch of metrics submitted from a single instance of service. This means that it's impossible to aggregate over the
service.instance.id
label!Why is this useful? Aggregating away the label attribute allows us to avoid cardinality explosions. In large scale deployments of services that can run thousands of instances, it's expensive to store the per-instance metrics. This is especially true if this metric has additional labels that have a large set of values.
While keeping per-instance metrics is often useful (e.g. CPU, memory, disk, etc.) there are times when it's not helpful to understand the per-instance breakdown (e.g. user adoption metrics, business KPI metrics) and so storing this information is redundant. This component will allow users to aggregate out the instance label and control the cost of expensive metrics.
Example configuration for the component
And given these two separate batches of metrics:
It would produce the following values:
Telemetry data types supported
Metrics
Is this a vendor-specific component?
Code Owner(s)
No response
Sponsor (optional)
No response
Additional context
I have prototyped a solution that has shown promise- this issue is to understand if the OpenTelemetry project would be keen to adopt it. The code is rough and so needs work but I would be happy to drive the development of this feature.