Closed alankm closed 3 years ago
Addendum: this change would also allow us to remove the dataInputSchema
and dataOutputSchema
parameters on the root level (workflow definition), which is probably a good thing. Right now it's unclear if a failure from the root-level dataOutputSchema
should trigger compensation, produce events, etc.
@alankm will look into this, just from glancing over it: failure should be "onErrors" probably "success" should be a "transition" object imo dont forget if its a state it will also have the start/end definitions
for reference i think just look at inject state https://github.com/serverlessworkflow/specification/blob/master/specification.md#inject-state and remove the data property and replace with the ones you are suggesting.
I do agree with an additional state because the spec is not very specific when it comes e.g. to the final workflow output schema. At the moment there is no way to define something if the workflow output is not what I want although all states did complete successfully.
I would prefer @tsurdilo changes in that state though. That makes it consistent with other states.
@tsurdilo The only reason I didn't do that was to avoid some of the baggage onErrors
brings. For example, there's never a good reason to perform retries on a validation failure. And also this state should only be capable of producing one type of error so an array is overkill.
I can see the value in reusing onErrors
just for the sake of consistency though. I don't have a strong opinion either way.
@alankm could there be more than one type of error, for example "schemaNotFoundError" "invalidSchemaError", "invalidSchemaVersionError" and like "schemaValidationError" to think of few. Also given that the runtime will perform the schema validation using some library I'm sure there could be many more types of errors that could be produced.
@tsurdilo I suppose you could implement it that way. I'd have performed all of that validation before accepting the workflow definition, not live during workflow execution.
@alankm oh. the schema validation is not against the workflow definition. that can be done on compile time. its against state data so it imo has to be done during workflow instance execution. am i misunderstanding something here?
@tsurdilo Validating the data against the schema is the only thing I think needs to be done during execution. Depending on your implementation, you could prefetch the schema (avoiding schemaNotFoundError) and check that it's valid (avoiding invalidSchemaError and invalidSchemaVersionError) long before execution.
But I agree you don't have to do it that way. You could fetch the schema on demand and then you'd need to support all these extra errors for sure.
@alankm thanks, very nice convo as usual. to me personally the interesting thing about state data validation is that workflow data is very dynamic, so it can change its structure after each state. this brings to the use case validation against different schema after each state, that is why we added it as part of each state. State A might want to validate against a completely different schema than state B that comes after, especially with data merging after function executions or cosume of event payloads etc. So that's why I was surprised in a way to see this possibly change to a state, as then i would need to stick this state specifically before and after each of my workflow states (including all its setup, params, transitions) etc..where really the use case i think this would fit best is quick validation of the current dynamic state data against different schemas depending on the state itself.
@tsurdilo This suggestion would definitely make life less convenient for anyone who loves validating their schema every step of the way. Arguably though, you only really need to sanitize your inputs and outputs (inputs includes action results and event data).
It's just a matter of opinion I think. Some people will prefer it the way it is, others will prefer it in its own state. I guess the real question is: Which one would be more popular?
@alankm agree. another question that may come up is why even have this at all, meaning i might as well just have an action that calls service that does the validation passing all the state data, "$." currently. With having "ValidationState" does this take us in the direction of maybe in future having an "EmailState" or "DatabaseState" etc? I am inclined to say that if we do go with sanitizing workflow inputs and outputs only having these properties as part of the top-level workflow props would even be better than having to stick this state before each and every one workflow path than ends up in an "end" definition.
@tsurdilo Interesting thought. But keep in mind that your "inputs" isn't just the stuff that happens at the start of the workflow. Any time you get the return results of an action or event switch or callback that foreign data will be merged in, so it'd also be important to support schema validating that data. So I'm not sure it can be completely replaced by the top-level fields alone.
edit: Unless you want to force users to subflow all of these things.
I don't have a strong opinion if we keep it or not but if it stays in there we probably should have something to handle the schema validation on workflow level. The spec says "It can be used for documentation purposes or implementations may decide to strictly enforce it." But what can I do at the end of a workflow? All states have been executed and now the schema validation fails. I can not compensate, call errors etc. What can I enforce? That is my point.
Within in the workflow I personally think actions invoked with corrupt data would return errors anyway so at some point in the workflow it would error out. That is why it is not super important to me.
@jensg-st agree with both of you. as you said even if this functionality is there, its not really fleshed out as to how to handle issues. would you be ok with going ahead and removing this functionality for now until we are able to flesh it out better?
What would you like to be added:
A new state that does nothing except validate the workflow's state data, so we can remove the
dataInputSchema
anddataOutputSchema
fields from other states (refactor).Validation State
Next Definition
Why is this needed:
There's a number of parameters that optionally appear on every existing state because users might need them at any point during the workflow. The
dataInputSchema
anddataOutputSchema
fields are one such example. Because they're likely to be rarely used, we can refactor the specification so that schema validation is a separate and very explicit step in a workflow. Then we could simplify the spec for all existing states by removing these fields.No functionality would be lost as a result of this change.
The Next Definition keeps things a little bit tidier. It's necessary to be able to support both transition and end because a validation makes sense as the final state when writing subflows.