quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.84k stars 2.7k forks source link

[Extension Proposal] Auto-compensating saga extension #40588

Open xian75 opened 6 months ago

xian75 commented 6 months ago

Description

An extension to develop SAGA/CQRS orchestrators and participants in an easy way. It allows you to focus on the logic of your business instead of think about the way to implement the SAGA pattern. It avoids the effort to write compensation operations because it does by itself automatically. All the data are stored in SQL databases, taking advantage of database constraints validations management. The framework depends on the following Quarkus extensions:

Repository name

quarkus-sagacqrs

Short description

An extension for SAGA/CQRS microservices development with SQL databases and automatic compensation operations.

Repository Homepage URL

https://quarkiverse.github.io/quarkiverse-docs/quarkus-sagacqrs/dev/

Repository Topics

Team Members

Additional context

No response

xian75 commented 6 months ago

I'm interested in leading the development of this extension. A beta version of this extension is already done.

gastaldi commented 2 months ago

@xian75 hey! That looks interesting. Do you have the beta version available somewhere so we can look?

xian75 commented 2 months ago

Hey @gastaldi, I added you as collaborator to my private projects. The sagacqrs is the library. Instead, orchestrator and three participants are samples to show how to use the library. Let me know what do you think about it.

gastaldi commented 2 months ago

Adding the links here, so I don't miss them:

Any chance to make them public so others can chime in?

xian75 commented 2 months ago

Now all repos are public. For any question or an overview about the lib don't hesitate to email me. I'll try to respond ASAP.

zhfeng commented 1 month ago

I think there is the quarkus-narayan-lra and just wonder this proposal extension has any difference since they both implement the SAGA pattern. Thanks!

xian75 commented 1 month ago

I didn't know the narayan extension but reading its overview I can say it's very different from my proposal one. Basically, the main difference is that my proposal doesn't require any compensation to be implemented cause the extension do it automatically on your behalf. Unfortunately these days I'm very busy but next days I'll give me a canche to write a deeper overview about my proposal and how it works.

xian75 commented 1 month ago

@zhfeng finally I found a bit of time to type something...

Overview The idea behind this library is a 2-steps transaction concept (not a 2PC). During the first step of a saga workflow, the orchestrator "open" a new distributed pending saga transaction asking participants to perform writing (eventually also reading) database operations. According to the saga pattern, each participant executes its duties in an ACID local database transaction. But all those create, update and delete operations don't definitely change the table rows. They just lock and change the "status" of those rows preventing they can be changed by other saga requests. Once all participants have finished their operations, the orchestrator collect their outcomes and change the distributed saga transaction status from pending to commit or rollback. Commit if all participants operations went through, rollback otherwise. A transaction is rolledback even when a timeout of one or more participant occur. This is the end of the first step. The orchestrator is provided by a couple of scheduled jobs. One of this collects all committed and rolledback distributed saga transactions asking participants to "finalize" them: this is the second step. Each participant finalize its database appling the changes to the table rows and unlocking them. Again, the finalization is applied by an ACID local database transaction and all operations are idempotents. Once all participants have finished their operations, the orchestrator collects their outcomes and delete the distributed saga transactions if all participants finish successfully. Otherwise, the orchestrator leaves the distributed saga transactions as they are waiting for the next scheduled job execution. Becuase every distributed saga transaction is indipendent one another, than each scheduled job commonly collects some distributed saga transactions to delete and others to leave. That means it's not necessary that all distributed saga transactions in a single scheduled job must go through to be deleted.

More details The orchestrator creates, updates and deletes distributed saga transactions in its own database, consisting of just one table: "transactions". This table has the following columns:

melloware commented 1 month ago

We have already implemented the Quarkus Temporal extension which is a full featured Saga implementation. We are doing a Quarkus Insights video on Monday about it. Cc @rmanibus @tmulle

melloware commented 1 month ago

There are actually two extensions that do Saga already. Quarkus DAPR and Quarkus Temporal.

You can watch the insights to see how it works: https://www.youtube.com/watch?v=XICZxuaeYwI&list=PLsM3ZE5tGAVatO65JIxgskQh-OKoqM4F2

xian75 commented 3 weeks ago

I watched at your video and let me say it was great! I didn't know about Temporal and it's awesome. As you explained in your demo, it looks like very simple developing microservices this way. You just need to define the workflow and the actions to perform step by step. Same way you define the compensations when the request can not be completed succesfully and must be rolledback. I really enjoied the way to define everything: just code. Moreover I was happy to find out that Quarkus Temporal extension and one I'm proposing can work togheter. In fact, my extension focuses on distibuted database transactions trying to solve one of the hardest challenge of microservices: data inconsistency. When I read "Microservices patterns" by Chris Richardson, I asked to myself: what happen if compensation fails? Do I need to compensate the compensation? And if it fails as well? Especially when you have unexpected system errors (bugs). Same questions discussed in https://www.ufried.com/blog/limits_of_saga_pattern/ So I thougth I had to find a way to solve this issue and finally I developed my extension. Basically, it manages compensations automatically leaving data consistent among saga participants. This way you can focus on features development with no efforts for compensations. Moreover you can split your monolithic database and application in a set of micro databases and micro applications letting you scale those ones easier. Finally Temporal extension doesn't exclude my extension and viceversa. On the contrary they can be mixed togheter. For instance, each service of your demo (order, payment, warehouse, etc.) can be realised by my extension in order to scale them easily. I hope some of you will have time to try my extension, even because I already shared a sample you can test just by running that. I keep my fingers crossed.