State machine for interfaces

lauwers commented 1 year ago

Since Version 2.0 no longer defines normative interface types, orchestrators no longer have the ability to automatically create workflows based on the service topology. We have identified a requirement to re-introduce support for automatically-generated declarative workflows in https://github.com/oasis-open/tosca-community-contributions/issues/91

As a first step, I propose that we introduce additional grammar for defining state machines associated with interfaces:

The state machine introduces a state variable associated with the interface (as opposed to with a node or relationship)
For each operation, the state machine defines the states in which the operation can be called
For each operation, the state machine defines the state into which the interface transitions after calling the operation.

The following provides a suggestion for how this could work (leveraging some of our imperative workflow syntax):

interface_types:

  Standard:
    attributes:
      state:
        type: string
        constraints:
          $valid_values:
            - $value: []
            - - initial
              - created
              - configured
              - started
              - deleted
              - error
    operations:
      create:
        preconditions:
          $equal:
            - $get_attribute: [ INTERFACE, state ]
            - initial
        on_success:
          $set_attribute: [ INTERFACE, state, created ]
        on_failure:
          $set_attribute: [ INTERFACE, state, error ]
      configure:
        preconditions:
          $equal:
            - $get_attribute: [ INTERFACE, state ]
            - created
        on_success:
          $set_attribute: [ INTERFACE, state, configured ]
        on_failure:
          $set_attribute: [ INTERFACE, state, error ]
      start:
        preconditions:
          $equal:
            - $get_attribute: [ INTERFACE, state ]
            - configured
        on_success:
          $set_attribute: [ INTERFACE, state, started ]
        on_failure:
          $set_attribute: [ INTERFACE, state, error ]
      stop:
        preconditions:
          $equal:
            - $get_attribute: [ INTERFACE, state ]
            - started
        on_success:
          $set_attribute: [ INTERFACE, state, stopped ]
        on_failure:
          $set_attribute: [ INTERFACE, state, error ]
      delete:
        preconditions:
          $equal:
            - $get_attribute: [ INTERFACE, stopped ]
            - created
        on_success:
          $set_attribute: [ INTERFACE, state, deleted ]
        on_failure:
          $set_attribute: [ INTERFACE, state, error ]

pmjordan commented 1 year ago

I understand and agree with the requirement. A complete example would need to include an exit from the error state. I'm not clear how the state obtains a value of 'initial' in the first place. Does it have to be explictly set somewhere and what happens if that is forgotton? Would an initial state of null be easier? I notice that both the W3 standard SCXML and AWS explictly mark one state as the starting value and optionally allow states to be marked as ends. The proposed structure appears to have a maximum of two possible exits from each state (success and failure). A general FSM would allow any number of transitions to other states based on a return value of the operation.

lauwers commented 1 year ago

Perhaps we could initialize the state using the default keyword?
Do you have examples of a FSM that allows transitions to other states based on return values? I see how this could be useful for errors (to transition into different type of error states). Do you have use cases for normal state transitions as well?

There is one aspect of the above syntax (which is based on the 1.3 workflow syntax) that bugs me a bit: it assumes a synchronous execution model where an operation is called and then either returns successfully or returns an error. Real-world implementations will likely adopt an asynchronous event-driven model where calling an operation kicks off a state transition, and at some later point there will be a notification that signals completion. This means that instead of using a general-purpose on_success keyword to signal completion, we should use notifications defined in the same interface instead. Presumably, we would use the same syntax to specify the states in which notifications are valid.

If we did this, would we avoid the need to support state transitions based on return values (since presumably we would just expect different notifications instead for this purpose)?

pmjordan commented 1 year ago

The default keyword to flag the initial state seem reasonable. My most recent TOSCA templates had to use V1.3 so that I had a an explict workflow but it only used create and delete operations. They were synchronous and I didn't get as far as adding error handling. So my comments are reaching back to earlier coding projects which used FSMs - and they were indeed event driven. I agree that a syntax to handle events would be better for the general case but it would complicate the simple case of just running a script synchronously and I suspect the simple case is by far the most common. Instead of the labels, on_success etc. could you have user defined labels then have the processor match return values from synchronous implemenation calls to those labels and also augment the existing notification syntax so that an arbitary notifcation can be mapped to one of those same labels?

lauwers commented 1 year ago

It seems to me that the same syntax could be used for modeling synchronous interfaces as for asynchronous interfaces. For example, here is (part of) a synchronous version of an interface:

---
title: Synchronous Interface
---
stateDiagram-v2
  initial --> created: create
  created --> started: start

Here is an asynchronous version of the same interface. This version models the intermediate transitioning states and the notifications to transition out of these states:

---
title: Asynchronous Interface
---
stateDiagram-v2
  initial --> creating: create
  creating --> created: on_created
  created --> starting: start
  starting --> started: on_started

It is up to the designer to decide which way they want to model this.

pmbruun commented 11 months ago

Is a true precondition merely enabling the operation or will the orchestrator automatically execute the operation once the precondition becomes true? I assume the latter, since otherwise you are not getting the workflow effect.

With the proposed syntax you can easily have preconditions enabled for more than one state at the same time.

There are two ways an orchestrator might interpret a case where two operations have true preconditions:

Execute both operations in parallel. Unfortunately, the subsequent $set_attribute upon success/failure may be setting different states causing either a race or an inconsistency of the interface.
Randomly select one of the operations to execute. Unfortunately, this will make the behavior of the orchestrator unpredictable, which few users will like.
Ensure that only one precondition may be true and make it a run-time error in the orchestrator to enable more than one. Unfortunately, this cannot be statically checked.
Syntactically indicate an evaluation order of the preconditions, making it predictable which one is selected. Unfortunately, this makes the end-to-end state machine rather hard for designers and users to understand.

As for asynchronous operations, this means that the actual underlying state-machine must split each state into two (or three) sub-states, and I believe that is what you are doing with the starting - started example. While this works, it adds additional complexity to the model. Could the -ing, -ed suffixes be more automatically expressed, e.g. using an independent sub-state attribute that would be available for all operations?

I am not fond of having to re-write the state-machine depending on the operation being synchronous or asynchronous. It would be far more elegant if a synchronous implementation could be substituted with an asynchronous one without having that propagate up into a change of the model that was supposed to be abstract.

In HPE SD we have automatic sub-states (pre, main, post) for each state and we use postconditions, where an asynchronous operation is able to set an attribute to indicate completion. This also allows the operation to asynchronously post updates to the attributes to indicate progress without actually transitioning to the next state.

You could of course always specify the postcondition as the precondition for the subsequent state, but that would further fragment the logic.

lauwers commented 11 months ago

Is a true precondition merely enabling the operation or will the orchestrator automatically execute the operation once the precondition becomes true? I assume the latter, since otherwise you are not getting the workflow effect.

Actually, I had assumed the former: if the precondition evaluates to False, then the event is ignored (under the assumption that any changes that result in the precondition becoming True will result in another event that will cause the precondition to be evaluated again).

pmbruun commented 11 months ago

Ok. So operations become enabled when their precondition becomes true, but it takes a separate even (or human action?) to execute one of the enabled operations.

lauwers commented 10 months ago

Presumably, the preconditions can only become true as a result of some event. That same event could also trigger the (enabled) operation. Or, said a different way, in model space nothing ever happens or changes without some event resulting in the change.

pmbruun commented 10 months ago

Right. But then you can have two (or more) pre-conditions becoming true as a result of the same event - and that was my original question. The answer is not: "leave it up to the orchestrator", because if the orchestrator (or the designer) wants to prioritize among the enabled operations (option 4), then there is missing syntax for specifying which one to pick. So that option is not open, and with this syntax we already decided that some such orchestrator solutions should not be possible.

We cannot make half a decision. Once we exclude some implementation options, we have to have an up-front decision about which implementations are allowed and which are not.

The examples all show a linear state-model in each direction (setup/teardown). If that is our intention, it should be reflected in the syntax. If that is not our intention, we should analyze some non-linear use-cases.

The reason I have to ask is that, coming from Telco, this is what the standard state machine looks like:

lauwers commented 9 months ago

In this standard state model, what is the difference between "reserve" and "provision"? I'm trying to figure out how these translate into TOSCA concepts such as:

creating a representation graph
fulfilling dangling requirements
substituting etc.

koppor commented 1 month ago

As a general input, there is also work in other fields w.r.t. to the requirements. Work has started on Arazzo. An initial observation was that there is no concept of partners; which in the context of TOSCA is not needed.

In general, this seems to be "Message Exchange Patterns". BPELlight was one idea to specify those: https://ieeexplore.ieee.org/document/4578482

Regarding the instantiation, the process instantiation patterns might be of help.

General thoughts:

Approaches such as the Common Workflow Language seem to be far away from covering the features
The MLOps community also develops workflow languages, but with other focus (one would need to look into that). They lean to do internal DSLs (Python code, ...)
I like this external DSL thing (YAML, XML, ...). Maybe, it's time to work on BPEL 3.0 (YAML-based). (BPMN is too heavy weight for our use cases IMHO)

oasis-open / tosca-community-contributions

State machine for interfaces #147