SAGA API - Githubissues

andybritz commented 5 years ago

The addition of a SAGA API to help with long running transactions. We should aim to solve the basic case as the full spectrum of SAGA complexity is something that would be difficult to cater for.

szoio commented 5 years ago

Some thoughts on sagas below for discussion...

Sagas

A saga is long-lived business transaction consting of several smaller actions or transactions that are executed independently. A saga manager is a process that aims to coordinate such a set of transactions and provide failure handling mechanisms.

Assumptions

A saga consists of a finite sequence of actions with a known end state
Individual actions don't need to know about the details any of the other actions
An action can be dependent on one or more actions, and will only be executed when all its precendents have succeeded
A dependent action isn't dependent on the result of any action executed before it¹
A corollary is saga actions can be fully defined up-front as data
If any action fails, the whole saga fails
Some saga actions can be undone by actions called compensating actions
If the saga fails, compensating actions, where defined, are executed for all successful actions so far
All saga actions commands are idempotent, as are their corresponding undo commands²
Saga actions that are not dependent on each other can be executed concurrently / in any order, as can their corresponding undo actions

Notes

This isn't as restrictive as it sounds. It just means this is not the responsibility of the saga manager. The processor of a dependent action can subscribe to an event stream produced by one of its precedents. It does however rule out conditional execution or branching logic. (See question below).
With Kafka exactly once semantics, we get some support for this out the box.

Questions

Does the scope of sagas include managing complex workflows and branching logic, or is it limited purely to managing distributed / long running transactions? In this doc we are currently assuming the latter.

Representation

A saga can be represented as a directed acyclical graph of individual actions. The vertices of the graph represent these transactions, and the edges represent the dependencies between them.

The Saga knows nothing about the implementation details of the action. From the sagas point of view, each action consists conceptually of a pair (startAction: Command, undoAction: Option[Command]). It may additionally contain configuration or rules for things like retries and timeouts.

Some requirements for the actions:

The action must be idempotent by design. This is because if a command is submitted, and no reply is received, it's impossible to know in general whether or not the action was executed.
The undoAction:
- Is a compensating action that will "semantically undo" the original action.
- Is optional, as some actions cannot be undone.
- Is commutative, as if undo actions are executed, the order in which they are executed shouldn't matter.
In order for actions to be usable in a saga, they must be saga aware:
- The action handler must be able to recognise saga metadata as part of the command request. This metadata enables the action to be identified as an action within a saga
- In addition to any events an action command handler generates as part of its normal course, it must emit a SagaActionFinished event. This event includes the action Id and the outcome of the action, keyed by the saga Id.

Workflow

Sagas are managed by a saga manager (also known as saga coordinator or process manager). In our implementation, this will be a Kafka (probably KStream) application.

We refer to the process that listens for commands and executes them as action processors. From the saga manager's point of view, these action processors are black boxes. All communication between the Saga manager and the action processors is via messaging.

In our Kafka based implementation, this is done by:

Publishing the startAction command (to the command topic), as well as a SagaActionStarted event.
Subscribing to the SagaActionFinished topic and waiting for action to finish.

The SagaActionFinished is a generic event type, and is shared for all topics.

A saga is started by creating an instance of the saga with it's own unique identifier, the Saga ID.

Each saga has an event log associated with it (as a shared log across all sagas, but keyed by the saga ID). Every interaction is logged in an event log:

If an action is launched.
If a message comes back that the action is finished, along with the success or failure.
According the the retry / timeout rules, any interaction based on this is logged.

These log entries are events, and are stored indefinitely in a saga interaction topic.

From this log of interactions, the current state of the saga at any point in time can be derived. As a saga has a bounded lifespan its state can easily be derived on the fly.

The saga has a handler associated with it that:

Derives the current state of the saga aggregate based on the cumulative history of events
Knows what event or events it is expecting next
Listens for updates from action processors in the form of these events (and times out if these events don't arrive)
Generates and logs new saga events
Submit commands for new actions to be processed
Transitions the saga aggregate state in accordance with the newly emitted events

The dependency graph defines the order in which actions are executed. An action can be started as soon as completion events have been received for all the actions it depends on.

When all actions in the graph are complete, the saga is completed successfully, and a saga complete event is emitted. The client application that launched the saga can listen for this event.

If any of the actions fail, either by receiving a failure event, timing out, or exhausting retries, the saga goes into undo mode. It then works through the dependency graph in reverse, sending commands to execute the compensating actions.

Once all the compensating actions have been completed the saga is complete, but failed. An event is emitted to his effect.

If any of the compensating actions fail, it still attempts to execute the remaining undo actions. In addition, a command is issued to request further investigation / send an email / log an issue in an issue tracker. This is an unexpected error, and will need escalation outside the scope of the saga manager itself.

The actor model is a useful abstraction for understanding and implementing sagas:

The saga manager launches and supervises the saga actions
It communicates with these processes via messages only
The saga's state is represented by the saga aggregate
State mutations happen in response to events received by the saga event handler

"Your actor framework is a process manager framework" - Greg Young

Some thoughts and observations

Composability

Because sagas are initiated with a command and emit an event when complete, just like any action they control, they can be composed into sagas of sagas without adding additional complexity.

So it makes sense to keep the sagas as simple as possible and compose where needed.

External processes effects

Any action can be controlled by a Saga - it just needs to wrapped so that it is started by a command message, and emits a message when it terminates.

This enables the saga manager to control processes that have interactions with external systems such as calling an endpoint on a web service.

This web service call must be idempotent, to handle the following scenario:

A command message is received
The http request is executed and returns successfully
Meanwhile, Kafka goes down and the action confirmation event cannot be generated

In this case the process manager will need to resubmit the command. This will now be a new command, and it will result in the webservice being called again. This is where idempotence is required - the second call tto the web service should be a no-op.

Implementation

Example data flows below:

Examples

Make bid saga

Start
Reserve funds against account -> FundsReserved or FundsReserveFailed
Place bid against account -> BidAccepted or BidRejected
Done

As these actions are saga aware, they also emit SagaActionFinished events.

Saga definition

{
  "name": "ReserveFunds",
  "parameters": {
    "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
    "actions": [
      {
        "name": "ReserveFunds",
        "actionId": "f2ada97b-90a3-4541-9c5f-c6782381c91c",
        "dependsOn": [],
        "command": {
          "name": "ReserveFunds",
          "parameters": {
            "reservationId": "4a53bf63-3ae9-483f-984f-013dcf327225",
            "accountId": "bddf81a2-23bb-4cad-979f-cb9f68e3e62a",
            "amount": 3000
          },
          "properties": {
            "timeout": "30 seconds",
            "retries": "retry config here..."
          }
        },
        "undoAction": {
         "actionId": "bc0b6da1-a192-455a-8c44-2e7bf08405ff",
         "name": "UndoReserveFunds",
          "command": {
            "name": "UndoReserveFunds",
            "parameters": {
              "reservationId": "4a53bf63-3ae9-483f-984f-013dcf327225",
              "accountId": "bddf81a2-23bb-4cad-979f-cb9f68e3e62a"
            }
          }
        }
      },
      {
        "name": "PlaceBid",
        "actionId": "c198f2de-ff42-4a5c-a79f-a4bef177ffb9",
        "dependsOn": ["f2ada97b-90a3-4541-9c5f-c6782381c91c"],
        "command": {
          "name": "PlaceBid",
          "parameters": {
            "bidId": "4a53bf63-3ae9-483f-984f-013dcf327225",
            "accountId": "bddf81a2-23bb-4cad-979f-cb9f68e3e62a",
            "auctionId": "e92d244a-96d3-458d-a29e-0900c9632cee",
            "amount": 1500
          },
          "properties": {
            "timeout": "30 seconds"
          }
        }
      } 
    ]
  }
}

Note that the saga definition as the same shape as a command (see examples below). It is exaclty that. In can itself be included as part of a bigger saga.

ReserveFunds Command

Key: "bddf81a2-23bb-4cad-979f-cb9f68e3e62a" (account ID)

{
  "name": "ReserveFunds",
  "parameters": {
    "reservationId": "4a53bf63-3ae9-483f-984f-013dcf327225",
    "accountId": "bddf81a2-23bb-4cad-979f-cb9f68e3e62a",
    "amount": 3000
  },
  "saga": {
    "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
    "actionId": "f2ada97b-90a3-4541-9c5f-c6782381c91c"
  }
}

Note the inclusion of saga metadata.

PlaceBid Command

Key: "e92d244a-96d3-458d-a29e-0900c9632cee" (auction ID)

{
  "name": "PlaceBid",
  "parameters": {
    "bidId": "4a53bf63-3ae9-483f-984f-013dcf327225",
    "accountId": "bddf81a2-23bb-4cad-979f-cb9f68e3e62a",
    "auctionId": "e92d244a-96d3-458d-a29e-0900c9632cee",
    "amount": 1500
  },
  "saga": {
    "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
    "actionId": "c198f2de-ff42-4a5c-a79f-a4bef177ffb9"
   }
}

UndoReserveFunds Command

Key: "bddf81a2-23bb-4cad-979f-cb9f68e3e62a" (account ID)

{
  "name": "UndoReserveFunds",
  "parameters": {
    "reservationId": "4a53bf63-3ae9-483f-984f-013dcf327225",
    "accountId": "bddf81a2-23bb-4cad-979f-cb9f68e3e62a"
  },
  "saga": {
    "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
    "actionId": "bc0b6da1-a192-455a-8c44-2e7bf08405ff",
    "undo": true
  }
}

Result events

In the scenario where

ReserveAccount succeeds
PlaceBid fails
UndoReserveAccount is executed, and succeeds

Key: "d03c9cba-e0b1-4acf-93dd-f903e6857d90" (saga ID)

ReserveAccount succeeds

{
  "name": "SagaActionFinished",
  "parameters": {
    "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
    "actionId": "f2ada97b-90a3-4541-9c5f-c6782381c91c",
    "result": "Success"
  } 
}

PlaceBid fails

{
  "name": "SagaActionFinished",
  "parameters": {
    "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
    "actionId": "c198f2de-ff42-4a5c-a79f-a4bef177ffb9",
    "result": "Failure"
  } 
}

UndoReserveAccount succeeds

{
  "name": "SagaActionFinished",
  "parameters": {
    "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
    "actionId": "bc0b6da1-a192-455a-8c44-2e7bf08405ff",
    "result": "Success",
    "undo": true
  } 
}

Saga finished event

This event is emitted to notify that the entire saga is complete.

{
  "name": "SagaFinished",
  "parameters": {
    "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
    "result": "Failure"
  } 
}

Saga state

Initial state

{
  "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
  "actions": []
}

After starting the reserve funds, but before its completion

{
  "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
  "actions": [ 
    {
      "id": "f2ada97b-90a3-4541-9c5f-c6782381c91c",
      "result": "Pending"
    }
  ]
}

After reserve funds completes

{
  "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
  "actions": [ 
    {
      "id": "f2ada97b-90a3-4541-9c5f-c6782381c91c",
      "result": "Success"
    }
  ]
}

After starting place bid, but before completion

{
  "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
  "actions": [ 
    {
      "id": "f2ada97b-90a3-4541-9c5f-c6782381c91c",
      "result": "Success"
    },
    {
      "id": "c198f2de-ff42-4a5c-a79f-a4bef177ffb9",
      "result": "Pending"
    }
  ]
}

Final saga state

{
  "sagaId": "d03c9cba-e0b1-4acf-93dd-f903e6857d90",
  "actions": [ 
    {
      "id": "f2ada97b-90a3-4541-9c5f-c6782381c91c",
      "result": "Success"
    },
    {
      "id": "c198f2de-ff42-4a5c-a79f-a4bef177ffb9",
      "result": "Failure"
    },
    {
      "id": "bc0b6da1-a192-455a-8c44-2e7bf08405ff",
      "result": "Success"
    }
  ]
}

Notes:

Action command handlers need to be Saga aware. This means that they need some information about the saga execution context.
- sagaId - this is needed to localise all events concerned with the execution of a saga onto the same partition in the saga event topic.
- actionId - each action within a saga has a unique ID. This is used to define the saga action state

szoio commented 5 years ago

Some more thoughts on Sagas / processes implementation: Two approaches:

A. Via the CommandAPI:

We can kind of do this already quite easily for small ad-hoc use cases.

Create a AggregateSet with multiple aggregates.
Get a CommandAPI for each of the aggregates we are interested in.
Call the commands sequentially with publishAndQueryCommand and then just flatmap in the next publishAndQueryCommand.
- Of course it would all be much nicer in Scala if we could create proper monad instances for FutureResult and then maybe use the writer monad pattern to record the queries so far. But we can also make it work in Java.
We could put a declarative wrapper around this control flow, but it will involve some casting and loss of type safety (in Java that is).
Simple solution for gluing Simple Sourcing only aggregate operations (potentially with undo logic).

Downsides:

It’s only a partially distributed solution. The entire application needs to know how to process each aggregate. Though we can fully distribute the application execution itself.
The CommandAPI is somewhat synchronous / poll-ly. It kind of feels like the true streaming way is more reactive / message or event driven.
It won’t work that well for long running processes - too much polling.

B. Pure messaging approach:

The process controller (i.e. the Saga manager application) doesn’t use the CommandAPI for executing the sub-actions, it just send a message straight to the command request topic for the aggregate.
It listen to a command response topic fo the result of an action (this topic doesn’t currently exist).
The process controller only needs to know about the commands for an aggregate, not their associated events and handlers.
Aggregate processors can run separately or in grouped in services or processes as deemed fit.
A declarative control flow wrapper would look the same.
Will work much better with long running processes (or 3rd party processes)

andybritz commented 5 years ago

In terms of B. Pure messaging, is there a way we can pull out the command API from the Handler, it would be nice if the ProcessController used the same API, but it does not need to be coupled with a CommandHandler and EventHandler.

This way the API is the same just the access to the implementation for the API differs?

szoio commented 5 years ago

Yes I hope this is possible. The CommandRequest has some specifics that may not be directly applicable to sagas (such as Sequence). Maybe there is an an appropriate interpretation in the saga context.

szoio commented 5 years ago

Experimental project added: https://github.com/simplesourcing/simplesagas

andybritz commented 5 years ago

Closing this issue, now that we have the saga project. We can continue our discussion in that space.

simplesourcing / simplesource

SAGA API #4

Sagas

Assumptions

Notes

Questions

Representation

Workflow

Some thoughts and observations

Composability

External processes effects

Implementation

Examples

Make bid saga

A. Via the CommandAPI:

B. Pure messaging approach: