hydro-project / fluent

A data-driven compute platform
Apache License 2.0
1.22k stars 173 forks source link

Support for distributed transactions #129

Closed dumblob closed 5 years ago

dumblob commented 5 years ago

I couldn't find any information about distributed transactions. Are they supported? If not, is it among the goals of Fluent or not? What is the current progress?

Btw. nice properties of the key value store - that's something what mankind is almost for 30 years waiting for :wink:.

vsreekanti commented 5 years ago

Hi @dumblob -- we don't currently support distributed transactions. It is a research area that we have discussed in the past but are not actively working on. All the consistency mechanisms currently supported in the system are coordination-free. Happy to discuss these in more detail if you are interested.

dumblob commented 5 years ago

If there are no plans for distributed transactions, how do you do "horizontal scaling" ("sharding", "partitioning", you name it)?

cw75 commented 5 years ago

Hello @dumblob -- first of all, when you say "distributed transactions", do you specifically mean serializable transactions?

Also, horizontal scaling seems to be orthogonal to whether or not the system supports transactions. Could you elaborate more on their connections in your mind?

Thanks a lot!

dumblob commented 5 years ago

Hello @dumblob -- first of all, when you say "distributed transactions", do you specifically mean serializable transactions?

Yes, actually I'm aiming at strictly serializable distributed transactions, but even some other level of consistency would be interesting to me.

Also, horizontal scaling seems to be orthogonal to whether or not the system supports transactions. Could you elaborate more on their connections in your mind?

It is orthogonal up to a degree. First of all, there is significant difference between support for transactions and support for distributed transactions in systems spanning more than one physical machine. In such clusters coordination avoidance can solve some (or many or maybe even all - that's a research question...) issues with features like "distribution of key-value pairs among cluster nodes when reading & writing/updating" and/or "dynamic addition & removal of nodes in the cluster" and/or "automated redistribution of data based on demand in a geographical location" etc.

My first question was about distributed transactions and I got it answered. My second question was about scalability (i.e. the underpinning technology for distribution of data) which, I expect, will give me some limited insight into the potential of the Fluent k-v store in the near future when it comes to strictly serializable distributed transactions.

vsreekanti commented 5 years ago

We don't have any plans to support strictly serializable distributed transactions.

The system is both sharded and replicated using a traditional distributed hash table. It comes with a built-in autoscaling system that monitors system load and adds and removes replicas as necessary. When a node joins/leaves the cluster, we redistribute the data across the hash ring. Hope this answers your questions. Feel free to reopen the issue if you have other questions!