serverlessworkflow / specification

Serverless Workflow Specification
http://serverlessworkflow.io
Apache License 2.0
706 stars 145 forks source link

Context Data Update Behavior #867

Open matthias-pichler-warrify opened 3 weeks ago

matthias-pichler-warrify commented 3 weeks ago

What would you like to be added:

The output.to field allows to write data to the context. Is the context merged or overwritten by the data produced by this expression?

given a context of

{
  "foo": "bar"
}

will the expression

output:
  to: '{a: 1}'

result in

{
  "foo": "bar",
  "a": 1
}

or

{
  "a": 1
}

Personally always extending makes sense to me from a usability perspective but it seems to me that context could really explode in size this way. Especially because jq has built in methods for extending objects, such as

output:
  to: '.a = 1'

or

output:
  to: '$context + { a: 1 }'

which both are more explicit about extending

Why is this needed:

This is relevant for making sure data flow is implemented correctly and context does not explode in size.

cdavernas commented 3 weeks ago

Well, yet another excellent question 😆

My opinion on this is that the result should be overwritten. The reason is that, as you said, jq provides ways to replace, update and extend.

I feel that if we just extend, we would throw down the bin some very cool jq features, would "force" runtimes to perform an additional processing after the evaluation and would probably loose some portability accross expression evaluation languages (in Synapse, we allow using both jq and js, for example). Additionally, as you also said, the context could explode in size.

fjtirado commented 3 weeks ago

I think the problem is the ambiguity of to. Why not adding "merge" and "set", if using the former, whatever json generated by the expression after "merge" is merged into the context, while is using "set", the context will be replaced by the json generated by the expression after "set".

cdavernas commented 3 weeks ago

To is ambiguous indeed. However it is not solved imho by using merge or set, as it's the content of the jq expression which (should) decide what to do (just update, replace or merge).

matthias-pichler-warrify commented 3 weeks ago

To is ambiguous indeed. However it is not solved imho by using merge or set, as it's the content of the jq expression which (should) decide what to do (just update, replace or merge).

I agree. I don't think merge or set solve the problem.

But I also think there is a problem with relying on jq expressions to handle updates (this is my understanding of how jq works, so correct me if I am wrong): jq can only operate in the context of one object.

given:

{
  "context": {
    "foo": "bar"
  },
  "output": {
    "a": 1
  }
}

and the following output expression:

output:
  to: '.b = [$.context.foo, .a]'

The desired output would be:

{
  "foo": "bar",
  "b": ["bar", 1]
}

If this expression is now run in the context of context it would update context to be:

{
  "foo": "bar",
  "b": ["bar", null]
}

because .a cannot be resolved relative to context. If however the expression is run in context of the output it would result in:

{
  "a": 1,
  "b": ["bar", 1]
}

which is the modified output not context.

Because of that I think it would make the most sense if:

  1. the expression is run in context of the output (because then accessing the output and $context is possible)
  2. the result of the expression replaces the context (this avoids size explosions)

a potential rename of the output.to might help or just explicit documentation.

fjtirado commented 3 weeks ago

@matthias-pichler-warrify I think it was crystal clear the expression should be run in the context of output and we were discussing what to do with the result of the expression evaluation. And focusing on your 2., Im not sure that the result of the expression should allways replace the context. I think that should depend on the writer preference. Thats why at the end of your rationale, after saying that merge or set does not fix the issue, you argue that to should be renamed to something else (I propose to use set). And on top of that I propose to add merge (so user explicitly announce its intention to merge the result into the context) If there is a explosion size when usinn merge, it would clearly be user`s fault. Basically, Im trying to cover a case in which the expression language (other than JQ) does not allow easy merging in the expression. Also, there might be users that prefer to delegate the merge to the infrastructure and just focus on filtering the output of the call (for example, if the call is an http, the expression might select a couple of properties from the http response that should be merged into the context)

cdavernas commented 3 weeks ago

jq can only operate in the context of one object.

@matthias-pichler-warrify Yes, that is correct. The operand being the "input".

... but you have parameters, which brings me to the $output variable, which is however not documented by mistake. This variable should only be present when evaluating output expressions.

cdavernas commented 3 weeks ago

(I propose to use set). And on top of that I propose to add merge

@fjtirado That's an intersting idea indeed. I feel however it adds additional properties for something that is already explicitly written by the user, which is the expression. If the user writes an expression that overwrites the context, so be it: if she's smart enough to write the expression, let's assume she's smart enough to understand it ^^.

If we choose to rename to, instead of set, I propose we consider output as a verb, and find an alertenative that "flows", which is why we chose the to term in the first place.

I'm not even sure there's a need to rename at all, the issue as I understand it being more about explicitly settling on/documenting the behavior.

fjtirado commented 3 weeks ago

@cdavernas
Lets rewind a bit and try to write in english what we are proposing to do. We are setting the context of the workflow with the value result of the expression. I would say that proper syntax should be something like this

  assing: <expr> 
  to: <variable to be set>

where the string is always an exprerssion and is context. However since it seems we do not consider any other action than replacing the context, we can perfectly write output: <expr>, documenting what is going to happen with the result of the expr and forgot about the to

cdavernas commented 3 weeks ago

I would say that proper syntax should be

Renaming is not the point here: it's off topic. If that's something you think it's necessary, let's open an issue or a discussion about it 😉

assing: to:

As said in my previous commentary, I see no need to "split" the functionality of the expression feature into two different variables.

fjtirado commented 3 weeks ago

@cdavernas what Im saying is that you do not need

output: 
  from: <expr>
  to: <expr>

just

output: <expr>

will do it And in documentation. output is the expression that returns the object that should be assigned to the context.

cdavernas commented 3 weeks ago

will do it And in documentation. output is the expression that returns the object that should be assigned to the context.

I understand what you wrote. What I said it's that it's off topic.

Plus, your last proposal leaves out schema.

Anyways, please move that to another issue/discussion.

fjtirado commented 3 weeks ago

How is this a different discussion if we are just discussing how to update the context?

fjtirado commented 3 weeks ago

https://github.com/serverlessworkflow/specification/issues/873

cdavernas commented 1 week ago

@matthias-pichler-warrify Hasn't this been addressed and closed by #873 ?