Open dmitryax opened 1 year ago
This seems to place a requirement for a collector build to include the transform processor to match this behavior. Is introducing the requirement of a separate processor the best choice for all users? Or should the k8sattr processor be able to do this in isolation?
I think this could turn into a more meta question for the collector community. How do we decide that a dependency on another component is the correct choice?
For example for the tail sampling processor, there's a soft (hard?) dependency on the group by trace processor. I don't even know if the group by tail sampling processor example is correct. That relationship may just be a "best practice" per say. The tail sampling processor has it's own decision_wait
configuration.
This one is a pretty straightforward separation of concerns. Parsing of an attribute value has nothing to do with k8s. So it should not be part of this processor. It was added a long time ago as extra functionality before we had any option to parse attribute values. Now we have the transform processor, which includes this particular feature. So we are not putting any dependency on another component, we are just removing functionality that doesn't belong to k8s attributes processor. Please let me know if it makes sense.
It may become less performant after this change, but I believe it's more important to keep a particular purpose for each component and avoid bloating and overcomplicating them with features that don't fit them. Happy to hear other opinions from @open-telemetry/collector-contrib-approvers
This one is a pretty straightforward separation of concerns. Parsing of an attribute value has nothing to do with k8s. So it should not be part of this processor. It was added a long time ago as extra functionality before we had any option to parse attribute values.
I agree.
wrt to performance concerns I'm not sure how huge the impact would be, in production users must be using transform processor
and it does make sense to deprecate FieldExtractConfig.KeyRegex
in lieu of transform processor
.
if the proposal gets accepted, can pick up the work
It may become less performant after this change, but I believe it's more important to keep a particular purpose for each component and avoid bloating and overcomplicating them with features that don't fit them.
I agree. Holding to separation of concerns is a good idea. As for performance, we can benchmark it, but the transformprocessor is quite performant.
We should not remove this feature from the processor, though, until the equivalent feature exists in the transformprocessor. It cannot currently extract new attributes using regex, we need to add the functionality via a new OTTL function.
@TylerHelmuth replace_pattern
should work already, I believe. Let me know if I miss something
k8sattributes:
extract:
annotations:
- tag_name: git.sha
key: kubernetes.io/change-cause
regex: GIT_SHA=(?P<value>\w+)
should be replaceable with
k8sattributes:
extract:
annotations:
- tag_name: git.sha
key: kubernetes.io/change-cause
transform:
metric_statements:
- context: resource
statements:
- delete(attributes["git.sha"]) where attributes["git.sha"] != nil and not IsMatch(attributes["git.sha"], "GIT_SHA=\w+")
- replace_pattern(attributes["git.sha"], "GIT_SHA=(\w+)", "$1") where attributes["git.sha"] != nil
But if we had extract
func, it'd be cleaner:
k8sattributes:
extract:
annotations:
- tag_name: k8s.change_cause
key: kubernetes.io/change-cause
transform:
metric_statements:
- context: resource
statements:
- extract(attributes, attributes["k8s.change_cause"], "GIT_SHA=(?P<git.sha>\w+)") where attributes["k8s.change_cause"] != nil
- delete(attributes["k8s.change_cause"]) where attributes["k8s.change_cause"] != nil
Is this what you're thinking of?
@dmitryax I forgot replace_pattern
could use $1
. You can remove a couple extra nil
checks:
k8sattributes:
extract:
annotations:
- tag_name: git.sha
key: kubernetes.io/change-cause
transform:
metric_statements:
- context: resource
statements:
- delete(attributes["git.sha"]) where not IsMatch(attributes["git.sha"], "GIT_SHA=\w+")
- replace_pattern(attributes["git.sha"], "GIT_SHA=(\w+)", "$1")
I see why it is so convenient to do the extraction within the k8sattributesprocessor - if you leave it for the transformprocessor then you have to explicitly check whether or not the attribute should be deleted.
If we add an extract function we could add an optional param to help with this:
k8sattributes:
extract:
annotations:
- tag_name: k8s.change_cause
key: kubernetes.io/change-cause
transform:
metric_statements:
- context: resource
statements:
- extract(attributes, attributes["k8s.change_cause"], "GIT_SHA=(?P<git.sha>\w+)", deleteOriginal=true)
@TylerHelmuth LGTM. We should introduce that function before working on this issue to have simpler migration guidelines. Please see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/25834 and edit it if necessary
@dmitryax after working on https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24599 for a bit I realized the existing extraction logic allows grabbing all the annotations/labels, which is really useful for users who want to grab everything. I believe the options we've discussed so far will remove that capability. I don't want to lose that feature, so if we move forward with removing key_regex
then I think we should implement an "everything" configuration option like the one proposed in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24599.
We don't lose that feature with this proposal. I'm not asking to remove key_regex
, I'm asking to remove regex
which is applied on labels/annotation values not keys
I just found and fixed a typo in the issue description which likely brought the confusion
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Hi, I would like to work on this issue cc @evan-bradley
What is the equivalent Transform processor config for extracting the following:
processors:
k8sattributes:
extract:
labels:
- tag_name: $$1
key_regex: (.*)
from: pod
annotations:
- tag_name: $$1
key_regex: (.*)
from: pod
The above comment was wrong as the scenario wasn't affected by the removal of the regex
field.
The configuration interface of the k8sattributes receiver is over-complicated. This is an attempt to simplify it a bit.
FieldExtractConfig
haveKeyRegex
andRegex
options, which is pretty confusing. TheRegex
config option doesn't seem to be very important. I can be easily replaced with a transform processor withreplace_pattern
andreplace_match
functions. Therefore, I suggest:FieldExtractConfig.Regex
option with instructions on how to replace it with the transform processor