microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

Azure Cosmos DB implementation #280

Closed AdmiralBond closed 1 day ago

AdmiralBond commented 2 weeks ago

My organization is interested in using durable functions, but none of the existing provider implementations fit our needs. We'd like a Cosmos implementation, but there doesn't seem to be one. The MSSQL one seems like the one which would be the closest implementation wise to a Cosmos one. Would any of the contributors here be interested in working on a Cosmos implementation? I apologize if this is a bit off topic, but it's actually surprising that there isn't already a Cosmos implementation. Thanks.

cgillum commented 2 weeks ago

Internally, we explored a Cosmos implementation several years ago. My memory is a bit fuzzy, but I think a couple problems we ran into were 1) some features, like durable timers, were hard to implement, and 2) the costs of running orchestrations and persisting data to Cosmos were quite high. I'm not sure whether these issues are still relevant today

Can you tell us a bit more about why the current provider implementations don't fit the needs of your organization? We're actually working on developing a fully managed backend implementation (where you don't have to bring your own backend storage), and I'm wondering if there are any requirements we should be considering.

AdmiralBond commented 2 weeks ago

We need something fast, scalable, and disaster recovery friendly. While looking over the docs for the existing providers, each of them seemed to have a limitation which was incompatible with those goals. Storage and SQL have limited throughput and Netherite's reliance on event hubs was a deal breaker for our ops guys. We use Cosmos for many reasons, but ultimately it appears to be Azure's most rock-solid service in terms of scalability, fast failover, performance, and disaster recovery. Would it be possible to re-evaluate your assessment of Cosmos? The reality is, if it's technically feasible, as we already spend a lot of money on scaled Cosmos, the cost of scaling it quite a bit more to handle a functions provider is not a concern. Thanks so much for your quick response!

cgillum commented 2 weeks ago

Thanks for the feedback. Later this month we plan to start advertising the managed backend that I mentioned more broadly. It should have both the performance, scale, and disaster recovery features you need. I hope you'll consider it, and perhaps work with us to ensure that it meets or exceeds your expectations in terms of your requirements.

As for introducing a Cosmos DB state provider, I can say pretty confidently that this won't happen unless the broader community builds it. The current set of DTFx maintainers just don't have the bandwidth for this as each new backend we introduce incurs significant ongoing maintenance and support costs. Rather, we're hoping to eventually consolidate down to just one or two backends in the long term.

microsoft-github-policy-service[bot] commented 5 days ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.