ProcessManager/Saga docs and samples

vladimirnani commented 6 years ago

Would be great to extend example module with sample ProcessManager code. Maybe showcase several aggregates/entities and process manager in a real world scenario.

johnbywater commented 6 years ago

Ok :) what do you mean by process manager?

vladimirnani commented 6 years ago

Something that will listen for events, fire commands and hold state cross aggregates. (Correct me if i am wrong)

https://msdn.microsoft.com/en-us/library/jj591569.aspx

http://blog.jonathanoliver.com/cqrs-sagas-with-event-sourcing-part-i-of-ii/ http://blog.jonathanoliver.com/cqrs-sagas-with-event-sourcing-part-ii-of-ii/

johnbywater commented 6 years ago

Sounds good. I looked at this pattern a few times, but I never used it.

There's some terminology in the microsoft doc which I don't especially like, but it's not a big issue. For example I don't think events are notifications. An event simply happens, but you can be notified about it. And I'm not sure it's good for aggregates to send commands to each other, aggregates just publish events: policies can respond to events by issuing commands. I supposed I sometimes thought of a process manager as centralised policies, so you can see in one place how all the events cause further commands.

Trying to be as naive as possible, I can say I don't really understand which state process managers hold - isn't the state of an application distributed across the set of aggregates? and how do they hold that state? are they event sourced, do we need another database schema? how are they resumed when they crash? how are they scaled?

Also, the "compensating transactions" thing always seemed to be a little but less than perfectly satisfactory: if they are supposed to correctly undo the situation when things go wrong, can a compensating transaction go wrong? Hopefully there's a good probability it won't go wrong, but if an intended transaction can go wrong, not because it was intended but because it was a transaction (and transactions sometimes go wrong), then I observe that a compensating transaction is also a transaction that could go wrong, which seems to leave something to be desired (reliability).

Anyway, let's start a feature branch and have a go? It seems like a popular pattern, and although I'm not entirely convinced by the compensations transaction thing, I haven't got a better way of doing it. I have been trying to work out how to make things ultra reliable, and I think it would be possible by having perfectly reliable repeaters, splitters and combiners. I've been writing a little bit about it in this doc, but it's far from finished and it's hardly a pattern, but the projection examples are perhaps "ultra reliable" by design, and I think the approach can be generalised, but I didn't do that work yet: http://eventsourcing.readthedocs.io/en/latest/topics/projections.html

The order-reservation-payments example seems interesting and familiar enough that we would need to discuss what the domain is all about? Perhaps you wanted to to write some example code? Maybe there's already something in Python we could look at? We can think about whether this is really in the scope of "eventsourcing" later :)

vladimirnani commented 6 years ago

Yes, so a process manager in our case can be a policy that is specifically written for business logic, i mean that it is not at infrastructure level.

Compensation transactions could go wrong right, but i think it is gives business value to have them declared as a transparent 'disaster recovery' actions.

Indeed, lets create a branch and ill try to create a order-reservation-payments scenario. Then we could reason about some code.

johnbywater commented 6 years ago

@vladimirnani I pushed the Process class, the test case is here: https://github.com/johnbywater/eventsourcing/blob/develop/eventsourcing/tests/test_process.py

Can you see how it could be used to implement an orders-reservations-payments system?

johnbywater commented 6 years ago

@vladimirnani I wrote some documentation for the "application process" thing, with a skeleton orders-reservations-payments system as an example: https://eventsourcing.readthedocs.io/en/latest/topics/process.html

I know it's not really an orders-reservations-payments system, and the prompts aren't event driven in that example, but how do you think this approach compares with the message-pushing process manager pattern?

vladimirnani commented 6 years ago

Very good read. Thanks!

Going to play around with it in bit.

johnbywater commented 6 years ago

Sorry for the delay. I've been changing the Process doc quite a lot. I tried to make it as good as I can.

Would be great to get some feedback about it. Maybe it's more readable now?http://eventsourcing.readthedocs.io/en/latest/topics/process.html

Since you were asking about actors, I wanted to say: There's an incomplete section at the bottom of that doc, about using an actor model framework to orchestrate processing. If you are still interested, perhaps we could work on that together?

I keep returning to the Thespian library, which claims to run actors across a cluster.

It would be great to have a Docker image that can be used in an auto-scaling cluster, something that automatically starts an actor system that runs an event sourced processing system. The actor system could then run the process applications.

Not sure if the Actor system would need to be supervised by e.g. Kubernetes, or if there should be a "leader election" amongst nodes to decide which node runs the Actor system and which nodes follow. If there needs to be a leader election, then perhaps Raft would be sufficient for that?

Basically, it would be great to have something into which people can drop in their aggregates and policies and system definition, and then deploy and have a running, auto-scalable, event-sourced event processing service. It doesn't have to be the best actor implementation and Docker file ever. But if we could get something working, then we would finally have a spike solution that stretches all the way from event storming, through domain modelling, to a distributed, auto-scaling event processing service.

Hope you're well.

vladimirnani commented 6 years ago

Very nice doc. Need more time to read it again thoroughly.

So you are thinking about creating a docker compose with several docker images each one containing a process part like payment or reservation. And each Actor will be a process manager.

Something like this:

Ordering System
Payment
Reservations

I have to read Thespian to check their communication protocol

johnbywater commented 6 years ago

Thanks a lot for the compliments. Just one image, that runs an actor system. But many machines, across which the actor system distributes process application-partitions.

vladimirnani commented 6 years ago

Ok got it. It will be a cluster of applications. How will we route to specific nodes? I mean if there is a command that needs to be processed, what is the mapping between aggregate_id and node. Or it doesnt even matter because db is shared

johnbywater commented 6 years ago

Good question. With actors, you just get the ID of an actor and use it to send a message. Within the actor system, there isn't a concept of "node", there are just actors. So we could start with a "system actor" (our system) and have it start one actor for each process application-partition, just like the Multiprocess class starts one OperatingSystemProcess for each process application-partition.

We'd need to figure out how to send a prompt from one actor to another. The Multiprocess class uses Redis, and each process application-partition knows what it is following, and subscribes to its channel on Redis. But with actors, we don't need Redis. We just need to have downstream actors subscribe to upstream actors, so they get prompted directly. Then, to make the polling work, if waiting for actor messages doesn't timeout, then I guess something could send messages regularly so polling happens.

If a node in the cluster goes down suddenly, the processing can resume safely, because the actor system will just restart the actors (I think, need to check...). But if the node running the actor system goes down, then every should also go down, and all processing will stop until the actor system is restarted. That's when another node might usefully just start another actor system - but we wouldn't want to create two actor systems at the same time. Hence Raft might help to elect a leader that would start the actor system.

Given the hierarchical nature of the actor system, inevitably there is a root, which is essentially a single point of failure. By contrast, running a number of machines each with static config giving them a fixed set of partitions to process, would perhaps show more "graceful degradation". An actor system might be faster at restarting actors on different nodes, than eg. Kubernetes restarts a Pod. Kubernetes could supervise an actor system leader node. I'd like to try actors, I'm just worried about the single point of failure thing.

All I want is something that keeps going despite nodes being taken out. :-)

johnbywater commented 6 years ago

Regarding commands... in the process document, I'd like to add a Commands process, something that puts commands for the system in a notification log for the system to process. The Orders process could follow the Commands process, and create the orders asynchronously. The Commands process could follow the Orders process. Then all the commands would be separated. A new version of a system could be tested by giving it all the commands and seeing if it creates the same events. It just seems like a slightly different thing - we don't actually want a child operating system process to be started for the Command process. It's something that a client needs to use. Perhaps clients shouldn't know about the partitioning, and the Commands process could randomly distribute command across the partitions. But it would be nice if a client used an actor to send the command to the Commands process, because then the client could block on getting a result ("ask()"?) rather than polling for the result. I guess there are lots of options....

vladimirnani commented 6 years ago

I guess we need to use ActorSystem('multiprocTCPBase') to be able to communicate between docker containers.

So in the case of multiprocess we were starting OperatingSystemProcess per partition. And now with Actor we are going to spinup a docker conainer with PartitionActor

johnbywater commented 6 years ago

Yes, basically instead of Multiprocess and OperatingSystemProcess, there might be SystemActor and ProcessActor (or perhaps "Processor"?). Maybe SystemActor could inherit from ActorSystem?

vladimirnani commented 6 years ago

Related to #136

pyeventsourcing / eventsourcing

ProcessManager/Saga docs and samples #137