open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.92k stars 2.28k forks source link

Regex support for renaming label value in metrictransformprocessor #7562

Closed andyzhangdialpad closed 1 year ago

andyzhangdialpad commented 2 years ago

Is your feature request related to a problem? Please describe. We are using otel-collector in our metric pipeline to Prometheus. We are trying to reduce country code label cardinality. Two operations we would like to do: 1. Turn all country codes to lower case. 2. Set some unpopular country codes to "other". It makes sense to use the metricstransformprocessor in this case. While regex is supported on renaming label, it is not supported on renaming label value.

Describe the solution you'd like We have regex support for renaming label.

# rename all system.cpu metrics to system.processor.*.stat
# instead of regular $ use double dollar $$. Because $ is treated as a special character.
# wrap the group name/number with braces
include: ^system\.cpu\.(.*)$$
match_type: regexp
action: update
new_name: system.processor.$${1}.stat

It would be great to support regex for renaming label value.

# for system.cpu.usage_time, rename the label state to cpu_state
include: xxx_metric_name
action: update
operations:
  - action: update_label
    label: country_code
    value_actions:
      - value: ^(ca|us)$$.             # Regex support on new_value
        match_type: regexp     
        new_value: $${1}

Describe alternatives you've considered We can do it in our client metric lib but it requires code change to all client lib (web, mobile and others). And I feel like we are "processing" the metric so adding support at metric transform processor may make sense.

Additional context Thank you!!!!

jpkrohling commented 2 years ago

cc @punya

Aneurysm9 commented 2 years ago

One complication I can foresee with this is that changing a metric value may require re-aggregation. When many country codes are reduced to other then the values for those metrics would need to be combined. Since you mention Prometheus I expect that these metrics have cumulative temporality, in which case counters and histograms would need to be added together to avoid having inconsistent behavior. I'm not sure what the appropriate resolution would be for gauges as either adding or averaging the values might be sensible.

punya commented 2 years ago

@Aneurysm9 you mean a metric attribute value, right?

andyzhangdialpad commented 2 years ago

Thank you all for commenting! Really appreciate it.

@Aneurysm9 My understanding of otel collector internal is still limited so I will leave others to discuss the re-aggregation. However, I saw that rename label value is supported. https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricstransformprocessor#rename-label-values

# rename the label value slab_reclaimable to sreclaimable, slab_unreclaimable to sunreclaimable
include: system.memory.usage
action: update
operations:
  - action: update_label
    label: state
    value_actions:
      - value: slab_reclaimable
        new_value: sreclaimable
      - value: slab_unreclaimable
        new_value: sunreclaimable

Let's say I have a configuration like the following. This should require re-aggregation too right? If so, I guess reaggregation you are talking about is probably supported already?

# rename the label value slab_reclaimable to sreclaimable, slab_unreclaimable to sunreclaimable
include: system.memory.usage
action: update
operations:
  - action: update_label
    label: state
    value_actions:
      - value: slab_reclaimable
        new_value: same_value             # Assign the same value.
      - value: slab_unreclaimable
        new_value: same_value             # Assign the same value.
Aneurysm9 commented 2 years ago

@Aneurysm9 you mean a metric attribute value, right?

Yes, an attribute value.

Let's say I have a configuration like the following. This should require re-aggregation too right? If so, I guess reaggregation you are talking about is probably supported already?

I'm not sure that is a valid configuration or that its results are semantically valid even if it does not fail the processor's config validation. It's certainly possible that I'm missing something that this processor does to ensure dimension reduction and attribute value rewriting is handled correctly, but I'd want to see a design doc detailing how it is handled. For some attributes it may not be a significant issue as non-identifying attributes may not require reaggregation with data outside of the current request, but even that's not necessarily a safe assumption when the batch processor is involved that may split sets of data points into two batches.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 1 year ago

Pinging code owners: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax commented 1 year ago

@andyzhangdialpad, if it's still relevant, please take a look at a new transform processor. It's more flexible and is expected to replace metricstransform processor going forward. I believe it should fit your needs. Let us know.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.