open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.76k stars 2.19k forks source link

[pkg/ottl] Add conflict resolution strategies for `set` #31808

Closed michalpristas closed 2 weeks ago

michalpristas commented 4 months ago

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

Related to a rename work here it would be useful to have the option to specify conflict resolution strategy for set as well.

Describe the solution you'd like

scenarios i have in mind at the moment are

we would end up with following signature: set(target, value, [Optional] conflict_resolution_strategy)

Describe alternatives you've considered

No response

Additional context

No response

github-actions[bot] commented 4 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

TylerHelmuth commented 4 months ago

I like this idea, although we'll need to me sure we clearly define what missing means as it would be different when setting an attribute versus a otlp field. missing for an attribute could mean "the key is not present" (attributes["missing"] returns nil), but for an otlp field like span name it will always be present on the span telemetry, but it may be empty string.

I'd like to align the names with the attribute processor's: insert, update, and upsert. Any additional feature, such as fail, can be named independently.

michalpristas commented 4 months ago

with the PoC i have for this i was playing with idea of checking nil as you said.

for otlp fields i have a question though. empty string check is fairly easy and i don't think setting field to empty string is something done on purpose. what about other types numbers, bools. What's the behavior there? Can it happen that these are present and set to default values (0, false)? or would these also default to ""? If default value is the case, than I'm not sure we can tell (especially with boolean) if it's false on purpose, or false because it's uninitialized.

TylerHelmuth commented 4 months ago

If default value is the case, than I'm not sure we can tell (especially with boolean) if it's false on purpose, or false because it's uninitialized.

Yes this will definitely be a problem. There are fields like dropped_attributes_count where 0 is a meaningful value. The metric's is_monotonic is another place.

michalpristas commented 4 months ago

i will try to play with these to figure out what the proper solution should be. but i can see that with insert mode and default value user can have mixed experience easily for these fields.

michalpristas commented 4 months ago

Playing with different fields today and yes, for some number/boolean fields it's most likely impossible to tell whether value is valid or default.

i see empty string values are usually uninitialized (missing)

so i was thinking about handling default value for number/boolean as not missing. handling nil value as missing and handling "" value as missing as well and documenting it very well.

user when working with transformations, specifying a field to be set should be field aware and can make best decision. also considering upsert default behavior (so we don't change anything by default), presence of the field does not really matter

i'd love to hear your thoughts. maybe I'm trying to make it unnecessary complicated and not user friendly

TylerHelmuth commented 4 months ago

As you say, we'd stick with upsert as the default. With that in mind, for update or insert, I think treating any default value as "missing" probably makes the most sense as it is consistent.

TylerHelmuth commented 3 months ago

One thing to consider: both update and insert logic can already be handled by where clauses:

@evan-bradley what are your thoughts.

TylerHelmuth commented 3 months ago

@michalpristas since the Where clause can technically handle update and insert, I think we only want to add this feature if:

  1. The ergonomics of set(attribute["test"], "pass", "update") is better than set(attribute["test"], "pass") where attributes["test"] != nil
  2. There are other conflict resolution strategies, like fail, that a Where clause can't handle.

Can you expand some more on the fail strategy? Is this a strategy for which you have an active user case?

michalpristas commented 3 months ago

for fail i don't have a use case in my pipelines. i just wanted to cover one last use case when it comes to target exists and missing/present matrix.

otherwise i'm fine with suggestions

kernelpanic77 commented 3 months ago

Can I take this up @TylerHelmuth @michalpristas ?

TylerHelmuth commented 3 months ago

@kernelpanic77 we're still discussing if we need this feature or if the Where clause is sufficient. We need to answer the questions from https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31808#issuecomment-2050140042.

For 1., I don't personally view set(attribute["test"], "pass", "update") as much better than set(attribute["test"], "pass") where attributes["test"] != nil because set(attribute["test"], "pass", "update") requires me to understand what update implies, but also I've been working with OTTL for a long time so the Where clause and != nil syntax is already known to me. If new users this is maybe not the case. What is your opinion?

For 2., I see the power in allowing future use cases, but I'd like to have some concrete use cases before basing the design on the possibility.

Normally adding an optional parameter would be no problem bc it is optional, but the missing conflict with this one makes it trickier. I dont want to add the feature unless we are sure we need it and we are sure about how we'll handle missing, since any changes to that logic would be a breaking change.

michalpristas commented 3 months ago

i agree, for me when i looked into ottl functions i did not even know that there is a possibility to do where statements. so for first time users having explicit arguments may play out well. some of our users are not that technical (this is also why you see me creating issues about things that can be done using regex or other way). I'd like them to have similar experience when they migrate to otel. If their config becomes longer and less readable it would be a tough sell. (sorry for me ranting about not that related stuff)

we were talking in case of missing, attributes are easy, it is nil check. the same as with where statements we were discussing default values to be considered missing. i think this is the trickiest part. also in case of where statements it requires users to know a bit about go as a language now to be able to assess default values.

I'm just trying to have this user in mind.

TylerHelmuth commented 3 months ago

i agree, for me when i looked into ottl functions i did not even know that there is a possibility to do where statements. so for first time users having explicit arguments may play out well.

This is not the first time I've heard this 😞 OTTL docs and discoverability need some love.

also in case of where statements it requires users to know a bit about go as a language now to be able to assess default values.

Very true, thats a really good argument for adding this feature.

evan-bradley commented 3 months ago

@evan-bradley what are your thoughts.

Sorry for the long delay. In general, I'm a fan of the "one correct way to do things" approach where we keep things as consistent as possible and try to build on existing patterns to help users solve problems. I feel like it makes the language more predictable.

I think this feature is good to consider, but my gut feeling is that we should continue to rely on where clauses. I have a few points supporting this:

  1. @TylerHelmuth has already mentioned that a user would need to determine which each option means, which I think adds mental overhead when reading the statements.
  2. This may not be a real issue, but this would open a footgun for users to introduce potentially non-obvious no-op statements:
set(attribute["test"], "pass", "update") where attributes["test"] == nil # Will never do anything
  1. This would essentially introduce an embedded where clause, which would make statements less readable when one is still used:
# These statements are equivalent:
- set(attribute["test"], "pass", "update") where attributes["other"] == nil
- set(attribute["test"], "pass") where attribute["test"] != nil and attributes["other"] == nil

That said, there are a few points in favor of this feature:

  1. Not all our users have programming experience, so the less syntax is involved, the better.
  2. It's possible users really would just prefer to specify a named strategy instead of relying on nil checks in a where clause.

I'm not going to suggest closing this issue, but before we implement it, I would like to see feedback from others that this is something they would like to see. Thanks @michalpristas for the detailed issue and input on this proposal.

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

andrzej-stencel commented 1 month ago

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies:

evan-bradley commented 3 weeks ago

@michalpristas Is this issue still relevant even if https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31901 is closed?

evan-bradley commented 3 weeks ago

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies

@andrzej-stencel Thanks for the input. How do you think separate functions compare to using OTTL's where clauses to achieve equivalent functionality?

andrzej-stencel commented 3 weeks ago

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies

@andrzej-stencel Thanks for the input. How do you think separate functions compare to using OTTL's where clauses to achieve equivalent functionality?

If I understand correctly, the following statements would be equivalent (as already mentioned above in this thread):

Looking at this, I definitely don't like the additional strategy parameter in the set function, as I think it's hard to read.

I think the new strategy functions update(), insert(), insertOrFail() increase the readability a bit. This might be especially noticeable in more complex scenarios. For example, I believe the following:

is easier to read than:

On the other hand, you could argue that the second option is more explicit. I suppose this goes down to a personal preference? :smile:

michalpristas commented 3 weeks ago

can you please assign this to me?

evan-bradley commented 3 weeks ago

@michalpristas Assigned, thanks for volunteering. I'd like to hold off on a PR for now until we've confirmed that we would like to add this functionality.

@andrzej-stencel Thanks for the input. I'm clearly biased, but I still find the verbosity of the where clauses to be more obvious for what is going on. If I'm reading OTTL statements, it would be immediately obvious to me when set is run based on the where clause conditions, and it wouldn't be obvious that insert or update strategies may throw errors without reading the documentation. While not an official OTTL policy (and I'm open to discussing the merits of this), I generally would prefer for OTTL to have a single obvious way to do things and using where clauses for this is in line with that.

@TylerHelmuth what are your thoughts?

michalpristas commented 3 weeks ago

i like what @andrzej-stencel suggested, it allows you to use where clause if you want, but in case you're not that technical, you don't understand inner working and when field is nil or "" it provides you with very clear options

michalpristas commented 3 weeks ago

when i take a look for example pipeline we have here and produce something otel like (not working)

we can see clearly that spotting updates is much easier with function calls than with an ocean of where clause. I care a lot about readability, and i think solution proposed by @andrzej-stencel is most readable so far

- set(attribute["user.id"], '{{{zoom.operator_id}}}') 
- set(attribute["user.email"], '{{{zoom.operator}}}') 
- set(attribute["user.id"], '{{{zoom.user.id}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.email"], '{{{zoom.user.email}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null && ctx.zoom?.user?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.full_name"], '{{zoom.old_values.first_name}} {{zoom.old_values.last_name}}') where ctx.zoom?.old_values?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.user.id}}}') where attribute["user.target.id"] != nil
- set(attribute["user.target.id"], '{{{zoom.user.id}}}') where attribute["user.target.id"] != nil
- set(attribute["user.target.email"], '{{{zoom.user.email}}}') where attribute["user.target.email"] != nil
- set(attribute["user.target.email"], '{{{zoom.user.email}}}') where attribute["user.target.email"] != nil
- set(attribute["user.target.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where attribute["user.target.full_name"] != nil
    where '(ctx.zoom?.old_values != null || ctx.zoom?.operator != null || ctx.zoom?.operator_id != null) && ctx.zoom?.user?.first_name != null'
- set(attribute["user.changes.id"], '{{{zoom.user.id}}}')
    where ctx.zoom?.old_values?.id != null && ctx.zoom?.old_values?.id != ctx.zoom?.user?.id
- set(attribute["user.changes.email"], '{{{zoom.user.email}}}')
    where ctx.zoom?.old_values?.email != null && ctx.zoom?.old_values?.email != ctx.zoom?.user?.email
- set(attribute["user.changes.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where ctx.zoom?.old_values?.first_name != null && ctx.zoom?.old_values?.last_name != null && (ctx.zoom?.old_values?.last_name != ctx.zoom?.user?.last_name || ctx.zoom?.old_values?.first_name != ctx.zoom?.user?.first_name)
- set(attribute["event.kind"], pipeline_error)
- set(attribute["user.id"], '{{{zoom.operator_id}}}') 
- set(attribute["user.email"], '{{{zoom.operator}}}') 
- set(attribute["user.id"], '{{{zoom.user.id}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.email"], '{{{zoom.user.email}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null && ctx.zoom?.user?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.full_name"], '{{zoom.old_values.first_name}} {{zoom.old_values.last_name}}') where ctx.zoom?.old_values?.first_name != null
- update(attribute["user.target.id"], '{{{zoom.user.id}}}')
- update(attribute["user.target.id"], '{{{zoom.user.id}}}')
- update(attribute["user.target.email"], '{{{zoom.user.email}}}')
- update(attribute["user.target.email"], '{{{zoom.user.email}}}')
- update(attribute["user.target.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where '(ctx.zoom?.old_values != null || ctx.zoom?.operator != null || ctx.zoom?.operator_id != null) && ctx.zoom?.user?.first_name != null'
- set(attribute["user.changes.id"], '{{{zoom.user.id}}}')
    where ctx.zoom?.old_values?.id != null && ctx.zoom?.old_values?.id != ctx.zoom?.user?.id
- set(attribute["user.changes.email"], '{{{zoom.user.email}}}')
    where ctx.zoom?.old_values?.email != null && ctx.zoom?.old_values?.email != ctx.zoom?.user?.email
- set(attribute["user.changes.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where ctx.zoom?.old_values?.first_name != null && ctx.zoom?.old_values?.last_name != null && (ctx.zoom?.old_values?.last_name != ctx.zoom?.user?.last_name || ctx.zoom?.old_values?.first_name != ctx.zoom?.user?.first_name)
- set(attribute["event.kind"], pipeline_error)
TylerHelmuth commented 3 weeks ago

ctx.zoom?.operator_id == null

what magic is this

michalpristas commented 3 weeks ago

no worries this is just untranslated condition, i just did a quick fulltext transform to illustrate the case it comes from our painless script

In painless to avoid NullPointerException exceptions you can use null safe operators, such as ?., and write your scripts to be null safe

michalpristas commented 2 weeks ago

@evan-bradley @TylerHelmuth

are we in an agreement here and can proceed or should we discuss during SIG or create a poll? not sure what the process is

evan-bradley commented 2 weeks ago

@michalpristas If you could come to a SIG meeting, I think that would be valuable. If we can't come to an agreement, a poll could also be a good option.

michalpristas commented 2 weeks ago

closing this issue as discussed in SIG. we'll stick with where clauses and for missing capability of insertOrFail we could use filter, validation or extension points (to be delivered)