serverlessworkflow / specification

Contains the official specification for the Serverless Workflow Domain Specific Language. It provides detailed guidelines and standards for defining, executing, and managing workflows in serverless environments, ensuring consistency and interoperability across implementations.
http://serverlessworkflow.io
Apache License 2.0
740 stars 146 forks source link

SubFlow state; child/parent interactions #140

Closed jorgenj closed 4 years ago

jorgenj commented 4 years ago

What is the question:

Given the following 'parent' and 'child' workflow definitions: Parent workflow

{
    "id": "ParentWorkflow",
    ...
    "events": [{
        "name": "ParentEvent",
        ...
     }],
    "functions": [{
        "name": "ParentFunction",
        ...
    }],
    "states":[
        {
            "name": "WaitForChild",
            "type": "SubFlow",
            "waitForCompletion": true,
            "workflowId": "ChildWorkflow",
            "start": { "kind": "default" },
            "end": { "kind": "default" }
        }
    ]
}

Child workflow

{
    "id": "ChildWorkflow",
    ...
    "events": [{
        "name": "ChildEvent",
        ...
     }],
    "functions": [{
        "name": "ChildFunction",
        ...
    }],
    "states":[
        ...
    ]
}

Inheritance

The SubFlow state section of the spec states that:

Sub-workflows inherit all the function and event definitions of their parent workflow.

What does this mean? Given that the intent of sub-workflows appears to be re-usability amongst potentially many parent workflow definitions, this seems to imply that the functions/event definitions of the parent workflow are injected at run-time into the 'execution context' of the child workflow instance when it's invoked. Is this correct?

Given the example above, can the ChildWorkflow have an Operation state which references a ParentFunction, even though one such function is not actually defined in the child workflow definition?

Does this mean that a child workflow can reference functions/events in its state definitions which are not actually defined in its workflow definition? I have a concern that this means that a particular workflow definition cannot be fully validated until the very moment when an instance of that workflow is created (since it might reference events/functions that are only provided by a 'parent' workflow).

Parent state data

The SubFlow state section of the spec states that:

If the waitForCompletion property is set to true, sub-workflows have the ability to edit the parent's workflow data. If the waitForCompletion property is set to false, data access to the parent workflow should not be allowed.

I'm confused about what "edit the parent's workflow data" means? Is it actually allowing the child to alter the data of the parent workflow in some way, or is this just intending something more like the following?

tsurdilo commented 4 years ago

Hi @jorgenj thanks for the nice questions!

1) Yes, the idea was to be able to reference the events and functions definition from the parent. I wanted to add new top top-level workflow param called "independent" or "executable", a boolean param with default being "true". Subflows can set it to "false" which clearly indicates that they cannot be executed on their own. WDYT, would that make it clearer? For validation, the SDK validators need to be updated to understand not to evaluate the eventRef or the functionRef names. From the schema perspective this should be ok as the functions and events arrays are not required.

2) Your second thought (the two bullet points) is absolutely correct. If waitForCompletion is true then the workflow data results of the sub-workflow executed can be and should be merged into the parent SubFlow state after completion so it can be passed on.

tsurdilo commented 4 years ago

@jorgenj when you see some parts of the spec docs where more explanation or info is needed, please feel free to suggest a change via a PR directly in the doc itself. Your contributions are more than welcome!

jorgenj commented 4 years ago

Thanks @tsurdilo, I'm still in the 'getting familiar with the project' phase, but I look forward to sending some PR's where I can contribute.

I'm looking at your PR now and I think the change answers my second question well.

For the inheritance feature;

As a workflow developer, when I create sub-workflows I've found that it's very useful in the testing phase to be able to execute them directly so I worry about losing that capability.

What would you think about 'inverting' things so it's not inheritance but 'injection' of events/functions that is supported? What I'm thinking is, all workflows would be required to define any functions/events they reference in their implementation, but a 'parent' workflow would have the option of injecting it's own function/event references when they invoke a child workflow, for example:

{
    "id": "ParentWorkflow",
    ...
    "events": [{
        "name": "AnEvent",
        ...
     }],
    "functions": [{
        "name": "DoSomething",
        ...
    }],
    "states":[
        {
            "type": "SubFlow",
            "workflowId": "ChildWorkflow",
            "overrideEvents": ["AnEvent"],
            "overrideFunctions": ["DoSomething"],
            ...
        }
    ]
}

{
    "id": "ChildWorkflow",
    ...
    "events": [{
        "name": "AnEvent",
        ...
     }],
    "functions": [{
        "name": "DoSomething",
        ...
    }],
    "states":[
        {
            "type": "Operation",
            "actions": ["DoSomething"],
            ...
        }
    ]
}

In the case where the child is invoked directly, it'd execute the function defined in the child definition. In the case where the parent invokes the child, it'd use the DoSomething function defined in the parent workflow definition.

This would have the benefit of having all workflows be 'fully defined', a child workflow wouldn't have any references to non-existant functions/events. I think this would allow implementations to validate all references at 'compile-time' rather than discovering broken references at run-time.

I think this could also avoid some subtle bugs where there's an unintentional name clash in events/functions defined in both parent and child workflows.

tsurdilo commented 4 years ago

@jorgenj I like your idea of injection. The idea in https://github.com/serverlessworkflow/specification/pull/141 is similar but more restrictive in that if the child workflow declares functions/events those would be used without looking at the parent ones. I am honestly thinking that this is more clean approach and less error prone as runtimes impl may differ but the workflow markup should be portable.

As far as your idea of injection, we are looking at possibilities to externalize the function/events definitions and reference those in workflow definitions (for example having a separate json/yaml file for defs only). I think that would help you out with the testing question. WDYT?

jorgenj commented 4 years ago

I see the injection model as allowing workflows to declare an explicit contract of what functions/events a parent workflow can override, but I can see the down-side that it didn't include a provision for declaring functions/events which are not allowed to be over-ridden. I hadn't realized that the current model would work that way, and I agree that it's useful for a workflow to have functions/events which aren't over-rideable.

Overall, I'm beginning to wonder if I misunderstood the intent of this inheritance feature. Your response makes me think that it's not about inheritance but rather about pure code-reuse. What I mean by that:

I can definitely see a strong case for the ability to reference external event/function definitions, considering the simple case where workflow A is sending an event to workflow B that listens for it, ideally the developer wouldn't have to copy-paste the same event definition into both workflow definitions (error-prone).

Perhaps it would help me to understand, is there a use-case for inheritance (vs. code-reuse defined above) that I should be keeping in mind?

jorgenj commented 4 years ago

Bringing this comment from the PR over here:

replied on the issue. i do like your idea of injection and it makes sense but I think it opens a door to hinder portability of the markup if runtimes have different merging / lookup criteria. Being sometimes a little more restrictive goes a long way in terms of portability :)

I suspect we're looking at the same concern from different angles, I'm also worried about different runtimes interpreting this feature differently. Consider the case where we have workflows A, B and C:

id: A
functions:
   - foo: ...
   - bar: ...
states:
   - callB:
       type: SubFlow
       workflowId: B
id: B
functions:
   - foo: ...
states:
   - callC:
       type: SubFlow
       workflowId: C
id: C
states: 
   - callFunctions:
       type: operation
       actions:
            - functionRef: foo
            - functionRef: bar

If A->B->C, will the bar function be available in workflow C? Which version of the foo function will workflow C use?

I think if we're going for inheritance (vs. code-reuse) then the spec should be very careful about defining the expected behavior.

tsurdilo commented 4 years ago

@jorgenj very good example, I think we should use it in the docs to explain this - is that ok with you? The way my intention was to interpret this is:

1) A declares functions "foo" and "bar" 2) B declares functions "foo" (and since has a function declaration means that it will not inherit/inject from the last declaration in the calling chain). It's function declaration becomes the "last one" in the calling chain. 3) C sees only functions of where they were last declared in the chain (B) (as itself does not have a function declaration) and such should raise a parsing exception (function declaration for "bar" not found"). The "foo" reference is from the last function declaration in B

I need to add this to the PR so it is clear (propagation happens on the last function declaration of the chain only). WDYT

Btw. this is aligned with how data is passed from A-B-C and would be easiest to understand imo.

jorgenj commented 4 years ago

@tsurdilo I'm not sure if you saw my prior comment here, I think made the mistake of hiding it with the second question (which you responded to just above).

I'm really curious to understand if the intent here is inheritance/injection or more like a means to enable code-reuse of event/function definitions.

tsurdilo commented 4 years ago

@jorgenj the intent of the subflow state itself is to promote code reuse and means to model workflows with reuse in mind.

jorgenj commented 4 years ago

I'm really sorry to be so dense, but I'm specifically struggling to understand the inheritance aspect of sub-workflows. In general, I completely agree on the value of being able to execute sub-workflows. Perhaps I just need to spend some more time thinking about the potential use-cases for inheritance.

If the primary intent for inheritance here is primarily as a means by which two workflows can reference the same function definition, then wouldn't it make sense to instead pursue the idea of 'external references' that you had mentioned earlier?

It seems like this would allow us to retain the following 2 properties (which seem like very useful properties to retain):

  1. Every workflow definition can be instantiated on it's own (no parent workflow required)
  2. Every workflow definition can be validated at 'compile time' (rather than waiting until 'runtime' to determine if the calling workflow provides all the necessary function/event definitions via inheritance or not)

The standalone flag seems to explicitly remove the ability to achieve property 1, and the current means of inheritance seem to be at-odds with property 2. I'm not opposed to the idea of inheritance but, in my humble opinion, the ideal solution would preserve these two properties above.

tsurdilo commented 4 years ago

@jorgenj After thinking about this some more I agree with your statements. Please see the updates on https://github.com/serverlessworkflow/specification/pull/141. Thanks.

tsurdilo commented 4 years ago

Closed by commit https://github.com/serverlessworkflow/specification/commit/54421ed03c93ec2f00316d0554352d638389ccc5