Open ryan-mars opened 3 years ago
I think cdk-triggers
is better because it is auditable in versioned code. We know exactly who did what when to production, as opposed to adhoc scripts
Proposed tenet: all operations are executed as declarative code deployed by a provisioning engine such as CloudFormation/Terraform/Pulumi.
This approach would also then enable us to vend operational constructs (such as Event Replay).
Sounds great
Newly developed read-models in an existing system may need to be historically accurate. Therefore, they will need to be "back-filled" with old events and "brought up to speed" when deployed for the first time. In order to protect the read-model from receiving events in an unexpected order, or live events interleaved with historical events, we cannot just pump all historical events into the read-model handler's queue. Here's a guess at how we could bring up a new read model without much trouble.
Assuming events originating within a bounded context are ordered (SNS/SQS FIFO or Kinesis) and have a lex sortable unique identifier (i.e. KSUID) a new read model could be deployed in two phases.
Backfill Phase
Live Phase
You could think of this as a sequential lambda architecture.
Does this require two CloudFormation deployments?
If all events have a ksuid, why does it matter if we interleaved live with historical events? The events in the live queue won't be ordered by time, only ordered as they arrived in the fifo queue or kinesis stream. The difference would be the amount of time separating the events. Does that matter? I hope not.
It depends on the aggregation strategy of the read model. If it's commutative and associative, then we can just do it all in one go and the model will eventually be consistent.
This strategy would work with any monoid data structure since a monoid (by definition) obeys all of these laws. We can provide a bunch of primitives (like algebird) or depend on some library (like automerge). Developers can then compose these primitives to build up a complex data structure specific to their business.
I think this is the most scalable and simplistic strategy and we should pursue it until it fails. It may fall down in some cases, but perhaps those are rare and maybe they can be avoided in clever ways? This friction may end up forcing the developers to build more scalable and reliable architectures instead of relying on slow linear processes that require absolute time order?
We may see an ecosystem and community develop around this problem, which is why I'm particularly interested in a composable approach using monoids.
Correction: monoid doesn't require commutative. We would want commutative monoids.
http://www.cs.umd.edu/class/spring2019/cmsc388F/lectures/monoids.html
It's just a fancy word for a data structure that has an initial value and a combine method.
interface Monoid<T> {
zero(): T;
combine(a: T, b: T): T
}
If we require developers implement combine in a commutative way, we can get a very long way.
That said, if it isn't commutative, then I think your strategy would work well. Although I wonder if we can achieve it in a single deployment?
skip events you've already seen
This could be quite complex. It would require you have a record of every ID you've already seen.
Sorry, stream of consciousness. Two advantage of the two-phased deployment is that it:
There may be a third phase where the developer enables the read model for use in a Policy or Command.
Does this require two CloudFormation deployments?
Yes, so long as we're committed to infra not changing outside the scope of CloudFormation.
If all events have a ksuid, why does it matter if we interleaved live with historical events? The events in the live queue won't be ordered by time, only ordered as they arrived in the fifo queue or kinesis stream.
It's all about DX. If SQS is feeding out-of-order events to a projection function the developer will be required to only use commutative operations in/on the read model's data store. Not all read models necessitate the complexity of "commutative/associative only". Alternatively, if stochastic can guarantee a total order of events, and I the developer know my domain well, I can write a very simple projection.
Zooming out, different domains will have different concurrency needs.
The simplest way for the developer to write a projection is for stochastic to guarantee a totally ordered set of events, and no partial replays. If I understand the concept properly such a read model would be a left fold over the events. There's no requirement for commutative operations. An example might be a read model for user profiles. Each user can only edit their own profile, it's non-collaborative. There are no race conditions, it's ok for last write to win. This is a very simple read model projection to write.
If total ordering cannot be achieved without unacceptable performance impacts because the domain is collaborative (multiple editors), or there is competition around physical resources (i.e. seat reservation) then the simple left fold won't work and the developer must use a more creative strategy.
The difference would be the amount of time separating the events. Does that matter? I hope not.
The duration of time between two events should not matter.
It depends on the aggregation strategy of the read model. If it's commutative and associative, then we can just do it all in one go and the model will eventually be consistent.
If we can make the DX really straightforward then yes, that would be ideal and would work for the simple cases too. Keep in mind with read models the developer will be dealing with a variety of data stores.
This strategy would work with any monoid data structure since a monoid (by definition) obeys all of these laws. We can provide a bunch of primitives (like algebird) or depend on some library (like automerge). Developers can then compose these primitives to build up a complex data structure specific to their business.
Let's prototype this.
I think this is the most scalable and simplistic strategy and we should pursue it until it fails. It may fall down in some cases, but perhaps those are rare and maybe they can be avoided in clever ways? This friction may end up forcing the developers to build more scalable and reliable architectures instead of relying on slow linear processes that require absolute time order?
Let's find out.
One of our design principles is to make simple things easy and hard things (distributed systems) possible.
If we can ensure absolute time order (by relying on vendor guarantees) then it's a nice crutch to have and it keeps the DX simple. However if the developer misjudges their problem space choosing the simple option will create problems for them. Perhaps starting from the premise that "events will never be totally ordered" and "partial replays are possible" could guide the design of stochastic in a better direction over the long term.
We may see an ecosystem and community develop around this problem, which is why I'm particularly interested in a composable approach using monoids.
That would be ideal. It's my understanding that it took some time for the Rails community to settle on ActiveRecord (persistence framework). If the developer has the competency to translate their event processing and read model persistence strategy into code, then it should fit into the framework in a predictable manner. We should provide at least one default.
Should this be done by
cdk-triggers
during deploy or should we allow this to be initiated ad-hoc via some sort of API?