oxidecomputer / steno

distributed sagas
Apache License 2.0
107 stars 10 forks source link

:showtitle: :toc: left :icons: font

= steno

This repo contains an in-progress prototype interface for sagas based on https://www.youtube.com/watch?v=0UTOLRTwOX0[Distributed Sagas as described by Caitie McAffrey]. See the crate documentation for details. You can build the docs yourself with:

cargo doc

Sagas seek to decompose complex tasks into comparatively simple actions. Execution of the saga (dealing with unwinding, etc.) is implemented in one place with good mechanisms for observing progress, controlling execution, etc. That's what this crate provides.

== Status

This crate has usable interfaces for defining and executing sagas.

Features:

There's a demo program (examples/demo-provision) to exercise all of this with a toy saga that resembles VM provisioning.

There are lots of caveats:

Major risks and open questions:

Future feature ideas include:

== Divergence from distributed sagas

As mentioned above, this implementation is very heavily based on distributed sagas. There are a few important considerations not covered in the talk referenced above:

We're also generalizing the idea in a few ways:

The terminology used in the original talk seems to come from microservices and databases. We found some of these confusing and chose some different terms:

[cols="1,2,1,2",options="header"] |=== |Our term |What it means |Distributed Sagas term |Why we picked another term

|Action |A node in the saga graph, or (equivalently) the user-defined action taken when the executor "executes" that node of the graph |Request |"Request" suggests an RPC or an HTTP request. Our actions may involve neither of those or they may comprise many requests.

|Undo action |The user-defined action taken for a node whose action needs to be logically reversed |Compensating request |See "Action" above. We could have called this "compensating action" but "undo" felt more evocative of what's happening.

|Fail/Failed |The result of an action that was not successful |Abort/Aborted |"Abort" can be used to mean a bunch of things, like maybe that an action failed, or that it was cancelled while it was still running, or that it was undone. These are all different things so we chose different terms to avoid confusion.

|Undo |What happens to a node whose action needs to be logically reversed. This might involve doing nothing (if the action never ran), executing the undo action (if the action previously succeeded), or something a bit more complicated. |Cancel/Cancelled |"Cancel" might suggest to a reader that we stopped an action while it was in progress. That's not what it means here. Plus, we avoid the awkward "canceled" vs. "cancelled" debate.

|===