open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

[processor/k8sattributes] Deprecate and remove FieldExtractConfig.Regex config option #25128

Open dmitryax opened 1 year ago

dmitryax commented 1 year ago

The configuration interface of the k8sattributes receiver is over-complicated. This is an attempt to simplify it a bit.

FieldExtractConfig have KeyRegex and Regex options, which is pretty confusing. The Regex config option doesn't seem to be very important. I can be easily replaced with a transform processor with replace_pattern and replace_match functions. Therefore, I suggest:

  1. Deprecate FieldExtractConfig.Regex option with instructions on how to replace it with the transform processor
  2. Delete the option after 5 releases.
bryan-aguilar commented 1 year ago

Doc for config options.

This seems to place a requirement for a collector build to include the transform processor to match this behavior. Is introducing the requirement of a separate processor the best choice for all users? Or should the k8sattr processor be able to do this in isolation?

I think this could turn into a more meta question for the collector community. How do we decide that a dependency on another component is the correct choice?

For example for the tail sampling processor, there's a soft (hard?) dependency on the group by trace processor. I don't even know if the group by tail sampling processor example is correct. That relationship may just be a "best practice" per say. The tail sampling processor has it's own decision_wait configuration.

dmitryax commented 1 year ago

This one is a pretty straightforward separation of concerns. Parsing of an attribute value has nothing to do with k8s. So it should not be part of this processor. It was added a long time ago as extra functionality before we had any option to parse attribute values. Now we have the transform processor, which includes this particular feature. So we are not putting any dependency on another component, we are just removing functionality that doesn't belong to k8s attributes processor. Please let me know if it makes sense.

dmitryax commented 1 year ago

It may become less performant after this change, but I believe it's more important to keep a particular purpose for each component and avoid bloating and overcomplicating them with features that don't fit them. Happy to hear other opinions from @open-telemetry/collector-contrib-approvers

shivanshuraj1333 commented 1 year ago

This one is a pretty straightforward separation of concerns. Parsing of an attribute value has nothing to do with k8s. So it should not be part of this processor. It was added a long time ago as extra functionality before we had any option to parse attribute values.

I agree. wrt to performance concerns I'm not sure how huge the impact would be, in production users must be using transform processor and it does make sense to deprecate FieldExtractConfig.KeyRegex in lieu of transform processor.

shivanshuraj1333 commented 1 year ago

if the proposal gets accepted, can pick up the work

TylerHelmuth commented 1 year ago

It may become less performant after this change, but I believe it's more important to keep a particular purpose for each component and avoid bloating and overcomplicating them with features that don't fit them.

I agree. Holding to separation of concerns is a good idea. As for performance, we can benchmark it, but the transformprocessor is quite performant.

We should not remove this feature from the processor, though, until the equivalent feature exists in the transformprocessor. It cannot currently extract new attributes using regex, we need to add the functionality via a new OTTL function.

dmitryax commented 1 year ago

@TylerHelmuth replace_pattern should work already, I believe. Let me know if I miss something

k8sattributes:
  extract:
    annotations:
      - tag_name: git.sha
        key: kubernetes.io/change-cause
        regex: GIT_SHA=(?P<value>\w+)

should be replaceable with

k8sattributes:
  extract:
    annotations:
      - tag_name: git.sha
        key: kubernetes.io/change-cause
transform:
  metric_statements:
    - context: resource
      statements: 
        - delete(attributes["git.sha"]) where attributes["git.sha"] != nil and not IsMatch(attributes["git.sha"], "GIT_SHA=\w+")
        - replace_pattern(attributes["git.sha"], "GIT_SHA=(\w+)", "$1") where attributes["git.sha"] != nil

But if we had extract func, it'd be cleaner:

k8sattributes:
  extract:
    annotations:
      - tag_name: k8s.change_cause
        key: kubernetes.io/change-cause
transform:
  metric_statements:
    - context: resource
      statements: 
        - extract(attributes, attributes["k8s.change_cause"], "GIT_SHA=(?P<git.sha>\w+)") where attributes["k8s.change_cause"] != nil
        - delete(attributes["k8s.change_cause"]) where attributes["k8s.change_cause"] != nil

Is this what you're thinking of?

TylerHelmuth commented 1 year ago

@dmitryax I forgot replace_pattern could use $1. You can remove a couple extra nil checks:

k8sattributes:
  extract:
    annotations:
      - tag_name: git.sha
        key: kubernetes.io/change-cause
transform:
  metric_statements:
    - context: resource
      statements:
        - delete(attributes["git.sha"]) where not IsMatch(attributes["git.sha"], "GIT_SHA=\w+")
        - replace_pattern(attributes["git.sha"], "GIT_SHA=(\w+)", "$1")

I see why it is so convenient to do the extraction within the k8sattributesprocessor - if you leave it for the transformprocessor then you have to explicitly check whether or not the attribute should be deleted.

If we add an extract function we could add an optional param to help with this:

k8sattributes:
  extract:
    annotations:
      - tag_name: k8s.change_cause
        key: kubernetes.io/change-cause
transform:
  metric_statements:
    - context: resource
      statements: 
        - extract(attributes, attributes["k8s.change_cause"], "GIT_SHA=(?P<git.sha>\w+)", deleteOriginal=true)
dmitryax commented 1 year ago

@TylerHelmuth LGTM. We should introduce that function before working on this issue to have simpler migration guidelines. Please see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/25834 and edit it if necessary

TylerHelmuth commented 1 year ago

@dmitryax after working on https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24599 for a bit I realized the existing extraction logic allows grabbing all the annotations/labels, which is really useful for users who want to grab everything. I believe the options we've discussed so far will remove that capability. I don't want to lose that feature, so if we move forward with removing key_regex then I think we should implement an "everything" configuration option like the one proposed in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24599.

dmitryax commented 1 year ago

We don't lose that feature with this proposal. I'm not asking to remove key_regex, I'm asking to remove regex which is applied on labels/annotation values not keys

dmitryax commented 1 year ago

I just found and fixed a typo in the issue description which likely brought the confusion

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 11 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

odubajDT commented 5 months ago

Hi, I would like to work on this issue cc @evan-bradley

TylerHelmuth commented 5 months ago

What is the equivalent Transform processor config for extracting the following:

processors:
  k8sattributes:
    extract:
      labels:
        - tag_name: $$1
          key_regex: (.*)
          from: pod
      annotations:
        - tag_name: $$1
          key_regex: (.*)
          from: pod
TylerHelmuth commented 3 months ago

The above comment was wrong as the scenario wasn't affected by the removal of the regex field.