Closed michalpristas closed 2 weeks ago
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I like this idea, although we'll need to me sure we clearly define what missing
means as it would be different when setting an attribute versus a otlp field. missing
for an attribute could mean "the key is not present" (attributes["missing"] returns nil), but for an otlp field like span name it will always be present on the span telemetry, but it may be empty string.
I'd like to align the names with the attribute processor's: insert
, update
, and upsert
. Any additional feature, such as fail
, can be named independently.
with the PoC i have for this i was playing with idea of checking nil as you said.
for otlp fields i have a question though. empty string check is fairly easy and i don't think setting field to empty string is something done on purpose. what about other types numbers, bools. What's the behavior there? Can it happen that these are present and set to default values (0, false)? or would these also default to ""? If default value is the case, than I'm not sure we can tell (especially with boolean) if it's false on purpose, or false because it's uninitialized.
If default value is the case, than I'm not sure we can tell (especially with boolean) if it's false on purpose, or false because it's uninitialized.
Yes this will definitely be a problem. There are fields like dropped_attributes_count
where 0
is a meaningful value. The metric's is_monotonic
is another place.
i will try to play with these to figure out what the proper solution should be. but i can see that with insert
mode and default value user can have mixed experience easily for these fields.
Playing with different fields today and yes, for some number/boolean fields it's most likely impossible to tell whether value is valid or default.
i see empty string values are usually uninitialized (missing)
so i was thinking about handling default value for number/boolean as not missing. handling nil value as missing and handling "" value as missing as well and documenting it very well.
user when working with transformations, specifying a field to be set should be field aware and can make best decision. also considering upsert default behavior (so we don't change anything by default), presence of the field does not really matter
i'd love to hear your thoughts. maybe I'm trying to make it unnecessary complicated and not user friendly
As you say, we'd stick with upsert
as the default. With that in mind, for update
or insert
, I think treating any default value as "missing" probably makes the most sense as it is consistent.
One thing to consider: both update
and insert
logic can already be handled by where clauses:
update
: set(attribute["test"], "pass") where attributes["test"] != nil
insert
: set(attribute["test"], "pass") where attributes["test"] == nil
@evan-bradley what are your thoughts.
@michalpristas since the Where
clause can technically handle update
and insert
, I think we only want to add this feature if:
set(attribute["test"], "pass", "update")
is better than set(attribute["test"], "pass") where attributes["test"] != nil
fail
, that a Where
clause can't handle.Can you expand some more on the fail
strategy? Is this a strategy for which you have an active user case?
for fail
i don't have a use case in my pipelines. i just wanted to cover one last use case when it comes to target exists
and missing/present
matrix.
otherwise i'm fine with suggestions
Can I take this up @TylerHelmuth @michalpristas ?
@kernelpanic77 we're still discussing if we need this feature or if the Where clause is sufficient. We need to answer the questions from https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31808#issuecomment-2050140042.
For 1.
, I don't personally view set(attribute["test"], "pass", "update")
as much better than set(attribute["test"], "pass") where attributes["test"] != nil
because set(attribute["test"], "pass", "update")
requires me to understand what update
implies, but also I've been working with OTTL for a long time so the Where clause and != nil
syntax is already known to me. If new users this is maybe not the case. What is your opinion?
For 2.
, I see the power in allowing future use cases, but I'd like to have some concrete use cases before basing the design on the possibility.
Normally adding an optional parameter would be no problem bc it is optional, but the missing
conflict with this one makes it trickier. I dont want to add the feature unless we are sure we need it and we are sure about how we'll handle missing
, since any changes to that logic would be a breaking change.
i agree, for me when i looked into ottl functions i did not even know that there is a possibility to do where statements. so for first time users having explicit arguments may play out well. some of our users are not that technical (this is also why you see me creating issues about things that can be done using regex or other way). I'd like them to have similar experience when they migrate to otel. If their config becomes longer and less readable it would be a tough sell. (sorry for me ranting about not that related stuff)
we were talking in case of missing, attributes are easy, it is nil check. the same as with where statements
we were discussing default values to be considered missing. i think this is the trickiest part. also in case of where
statements it requires users to know a bit about go
as a language now to be able to assess default values.
I'm just trying to have this user in mind.
i agree, for me when i looked into ottl functions i did not even know that there is a possibility to do where statements. so for first time users having explicit arguments may play out well.
This is not the first time I've heard this 😞 OTTL docs and discoverability need some love.
also in case of where statements it requires users to know a bit about go as a language now to be able to assess default values.
Very true, thats a really good argument for adding this feature.
@evan-bradley what are your thoughts.
Sorry for the long delay. In general, I'm a fan of the "one correct way to do things" approach where we keep things as consistent as possible and try to build on existing patterns to help users solve problems. I feel like it makes the language more predictable.
I think this feature is good to consider, but my gut feeling is that we should continue to rely on where clauses. I have a few points supporting this:
set(attribute["test"], "pass", "update") where attributes["test"] == nil # Will never do anything
# These statements are equivalent:
- set(attribute["test"], "pass", "update") where attributes["other"] == nil
- set(attribute["test"], "pass") where attribute["test"] != nil and attributes["other"] == nil
That said, there are a few points in favor of this feature:
I'm not going to suggest closing this issue, but before we implement it, I would like to see feedback from others that this is something they would like to see. Thanks @michalpristas for the detailed issue and input on this proposal.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies:
@michalpristas Is this issue still relevant even if https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31901 is closed?
As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies
@andrzej-stencel Thanks for the input. How do you think separate functions compare to using OTTL's where clauses to achieve equivalent functionality?
As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies
@andrzej-stencel Thanks for the input. How do you think separate functions compare to using OTTL's where clauses to achieve equivalent functionality?
If I understand correctly, the following statements would be equivalent (as already mentioned above in this thread):
strategy
upsert
update
insert
insertOrFail
Looking at this, I definitely don't like the additional strategy parameter in the set function, as I think it's hard to read.
I think the new strategy functions update(), insert(), insertOrFail() increase the readability a bit. This might be especially noticeable in more complex scenarios. For example, I believe the following:
is easier to read than:
On the other hand, you could argue that the second option is more explicit. I suppose this goes down to a personal preference? :smile:
can you please assign this to me?
@michalpristas Assigned, thanks for volunteering. I'd like to hold off on a PR for now until we've confirmed that we would like to add this functionality.
@andrzej-stencel Thanks for the input. I'm clearly biased, but I still find the verbosity of the where clauses to be more obvious for what is going on. If I'm reading OTTL statements, it would be immediately obvious to me when set
is run based on the where clause conditions, and it wouldn't be obvious that insert
or update
strategies may throw errors without reading the documentation. While not an official OTTL policy (and I'm open to discussing the merits of this), I generally would prefer for OTTL to have a single obvious way to do things and using where clauses for this is in line with that.
@TylerHelmuth what are your thoughts?
i like what @andrzej-stencel suggested, it allows you to use where clause if you want, but in case you're not that technical, you don't understand inner working and when field is nil or "" it provides you with very clear options
when i take a look for example pipeline we have here and produce something otel like (not working)
we can see clearly that spotting updates is much easier with function calls than with an ocean of where clause. I care a lot about readability, and i think solution proposed by @andrzej-stencel is most readable so far
- set(attribute["user.id"], '{{{zoom.operator_id}}}')
- set(attribute["user.email"], '{{{zoom.operator}}}')
- set(attribute["user.id"], '{{{zoom.user.id}}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.email"], '{{{zoom.user.email}}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null && ctx.zoom?.user?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}')
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}')
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}')
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}')
- set(attribute["user.target.full_name"], '{{zoom.old_values.first_name}} {{zoom.old_values.last_name}}') where ctx.zoom?.old_values?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.user.id}}}') where attribute["user.target.id"] != nil
- set(attribute["user.target.id"], '{{{zoom.user.id}}}') where attribute["user.target.id"] != nil
- set(attribute["user.target.email"], '{{{zoom.user.email}}}') where attribute["user.target.email"] != nil
- set(attribute["user.target.email"], '{{{zoom.user.email}}}') where attribute["user.target.email"] != nil
- set(attribute["user.target.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where attribute["user.target.full_name"] != nil
where '(ctx.zoom?.old_values != null || ctx.zoom?.operator != null || ctx.zoom?.operator_id != null) && ctx.zoom?.user?.first_name != null'
- set(attribute["user.changes.id"], '{{{zoom.user.id}}}')
where ctx.zoom?.old_values?.id != null && ctx.zoom?.old_values?.id != ctx.zoom?.user?.id
- set(attribute["user.changes.email"], '{{{zoom.user.email}}}')
where ctx.zoom?.old_values?.email != null && ctx.zoom?.old_values?.email != ctx.zoom?.user?.email
- set(attribute["user.changes.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
where ctx.zoom?.old_values?.first_name != null && ctx.zoom?.old_values?.last_name != null && (ctx.zoom?.old_values?.last_name != ctx.zoom?.user?.last_name || ctx.zoom?.old_values?.first_name != ctx.zoom?.user?.first_name)
- set(attribute["event.kind"], pipeline_error)
- set(attribute["user.id"], '{{{zoom.operator_id}}}')
- set(attribute["user.email"], '{{{zoom.operator}}}')
- set(attribute["user.id"], '{{{zoom.user.id}}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.email"], '{{{zoom.user.email}}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null && ctx.zoom?.user?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}')
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}')
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}')
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}')
- set(attribute["user.target.full_name"], '{{zoom.old_values.first_name}} {{zoom.old_values.last_name}}') where ctx.zoom?.old_values?.first_name != null
- update(attribute["user.target.id"], '{{{zoom.user.id}}}')
- update(attribute["user.target.id"], '{{{zoom.user.id}}}')
- update(attribute["user.target.email"], '{{{zoom.user.email}}}')
- update(attribute["user.target.email"], '{{{zoom.user.email}}}')
- update(attribute["user.target.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
where '(ctx.zoom?.old_values != null || ctx.zoom?.operator != null || ctx.zoom?.operator_id != null) && ctx.zoom?.user?.first_name != null'
- set(attribute["user.changes.id"], '{{{zoom.user.id}}}')
where ctx.zoom?.old_values?.id != null && ctx.zoom?.old_values?.id != ctx.zoom?.user?.id
- set(attribute["user.changes.email"], '{{{zoom.user.email}}}')
where ctx.zoom?.old_values?.email != null && ctx.zoom?.old_values?.email != ctx.zoom?.user?.email
- set(attribute["user.changes.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
where ctx.zoom?.old_values?.first_name != null && ctx.zoom?.old_values?.last_name != null && (ctx.zoom?.old_values?.last_name != ctx.zoom?.user?.last_name || ctx.zoom?.old_values?.first_name != ctx.zoom?.user?.first_name)
- set(attribute["event.kind"], pipeline_error)
ctx.zoom?.operator_id == null
what magic is this
no worries this is just untranslated condition, i just did a quick fulltext transform to illustrate the case it comes from our painless script
In painless to avoid NullPointerException
exceptions you can use null safe operators, such as ?., and write your scripts to be null safe
@evan-bradley @TylerHelmuth
are we in an agreement here and can proceed or should we discuss during SIG or create a poll? not sure what the process is
@michalpristas If you could come to a SIG meeting, I think that would be valuable. If we can't come to an agreement, a poll could also be a good option.
closing this issue as discussed in SIG. we'll stick with where
clauses and for missing capability of insertOrFail
we could use filter
, validation or extension points (to be delivered)
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
Related to a rename work here it would be useful to have the option to specify conflict resolution strategy for
set
as well.Describe the solution you'd like
scenarios i have in mind at the moment are
we would end up with following signature:
set(target, value, [Optional] conflict_resolution_strategy)
Describe alternatives you've considered
No response
Additional context
No response