Open XAMPPRocky opened 1 year ago
I just paged this back in and am summarizing for myself as much as anything: I think the reason things look the way they do now is that the "write" side of this necessarily has to be driven by Steno while the "read" side can be driven by the consumer, and I tend to prefer library-like interfaces over the inversion of control associated with a framework. That's not a hard rule, though -- just how we got here.
To elaborate: Steno looks like a framework during normal operation. The consumer creates a saga and then hands control over to Steno. Steno calls back into the consumer in narrow ways to execute actions and undo actions and to persist state. We do this using traits -- SecStore
in this case providing the methods to persist state.
For us, recovery happens when the SEC process starts up. This involves reading whatever was written (namely, a list of running sagas and their complete logs) and then calling SecClient::saga_resume()
for each one. We implement that here:
https://github.com/oxidecomputer/omicron/blob/main/nexus/db-queries/src/db/saga_recovery.rs
This part is less framework-y and more library-y in that the consumer is driving the work and handing things off to Steno.
As you suggest, I could imagine making this more framework-y by adding read methods to SecStore
for listing sagas and logs and then providing a recover()
function in Steno that uses those methods and calls saga_resume()
for each saga found. Is that useful? The only thing I'd be worried about is that there may be critical policy choices hidden here that we'd want to let the consumer decide, like whether to list everything in one go (vs. paginated queries), whether to be able to resume sagas as they're found vs. wait for all of them, and whether to immediately call saga_start()
for all the recovered sagas. This would probably come out in the implementation or else by comparing the result to the recovery code above.
The only thing I'd be worried about is that there may be critical policy choices hidden here that we'd want to let the consumer decide
Yeah I agree, though I think there's no real downside to having these policy choices driven through configuration and traits. For example instead of adding "read" methods to SecStore
, steno could have a SecDriver
trait with those read methods (similar to how libs like serde
have Serialize
and Deserialize
), that way you can decouple those policy choices from the persistence layer and have multiple kinds of drivers for a given SecStore
depending on how mission critical the service is.
I also don't see a reason why steno couldn't have both right? Like if there was SecDriver
trait, you could have a async fn drive<S: SecDriver>()
function on SecClient
, and if you're someone who just wants to let steno handle it you call that, and if you're someone who wants absolute control, you can still call it the same way you do today, you just don't call drive
. That would also be a way to implement this while allowing dependents to upgrade to the newer version without needing to rewrite their recovery logic.
Hello 👋 While implementing a
SecStore
for our own persistence layer I noticed that there's no way to for aSecClient
to read from aSecStore
, it can only write to it. This means that it's up to the clients to implement crash recovery for their app. This seems like ideal to be implemented by theSecClient
instead, which would require additional methods toSecStore
to be able to read from theSecStore
.Is this something that you'd be interested in adding to
steno
?