uber / cadence

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
https://cadenceworkflow.io
MIT License
8.29k stars 799 forks source link

Support Azure Cosmos DB as a Data Store #4779

Open WToma opened 2 years ago

WToma commented 2 years ago

Feature Request Azure's hosted Cosmos DB offering has a Cassandra-like interface. It would be convenient to use that, so that one doesn't have to maintain their own Cassandra instance as a data store.

After some discussion (Slack) it looks like the last thing that was missing was the support for LWTs (lightweight transactions). Cosmos added support for unlogged batches, but Cadence is using logged batches. In Cassandra, logged batches provide atomic writes for all writes in the batch, while unlogged batches provide atomic writes per partition. Per a personal note from an Azure team member, the Cosmos DB Cassandra interface's unlogged batch support follows this semantic.

Proposed Solution The proposal is to remove the use of logged batches from Cadence's Cassandra data store implementation. After reviewing the existing uses of logged batches, it seems that most of those operate on a single partition already, therefore changing the batch type to unlogged will still give the same guarantee w.r.t. atomic writes.

In the review I found one use case where the logged batch doesn't operate on a single partition: DeleteFromHistoryTreeAndNode. After discussing with @longquanzheng, we think that this doesn't need to be transactional, so it's also safe to change to an unlogged batch.

The concrete plan is:

  1. Prepare a pull request that changes the logged batches to unlogged batches.
  2. Test that version using https://github.com/uber/cadence/tree/master/bench#basic first with a Cassandra backend...
  3. ... then set up a Cadence instance pointing to a CosmosDB data store, and repeat the test.
longquanzheng commented 2 years ago

Instead of replacing logged with unlogged, we can add a config to allow switching to unlogged.

Changing it directly is dangerous because there are many service is running in production.

WToma commented 2 years ago

Instead of replacing logged with unlogged, we can add a config to allow switching to unlogged.

Good point. I might do a proof-of-concept without the configuration change, but for the final PR aimed to actually get merged I'll add the configuration option.

elkh510 commented 1 year ago

hi @WToma we have same issue maybe there is a forecast when support will be added ? thank you!

WToma commented 1 year ago

Hi @elkh510, unfortunately I stopped working on this when I switched companies a few months ago. That said, the last time I looked at this, the plan laid out in the issue description was still valid, and the required code change is pretty small and straightforward; the hardest part of solving this issue is probably setting up the test infrastructure and running the test suite.

Btw Azure also offers hosted Cassandra, so maybe that could be another option for you?