juji-io / datalevin

A simple, fast and versatile Datalog database
https://github.com/juji-io/datalevin
Eclipse Public License 1.0
1.13k stars 63 forks source link

Distributed mode #9

Open huahaiy opened 4 years ago

huahaiy commented 4 years ago

The current implementation is a standalone mode, we will add a distributed mode to allow data replications across multiple nodes.

Design criteria:

Strong consistency. CP in term of CAP theorem: transactions have a consistent total order; support linearziable reads; support dynamic cluster membership. Pass Jepsen tests.

We choose CP because a fast failing (unavailable) system is simpler to program around than a system that sometimes produces wrong results. Simplicity for users is the main design objective for Datalevin. All our design choices, Datalog, mutable DB, and CP are consistent with this goal.

Implementation will use Raft consensus algorithm:

Raft is a much better solution than the designated transactor concept of Datomic. In raft, the leader is elected, not fixed. With raft, the same transaction total order is achieved without the cost and complexity of operating designated transactors. Even with a standby, transactors are still a single point of failure.

Also, Datomic doesn't seem to have mechanism to ensure linearized reads. "database as a value" does't say the value is the latest version. It could well be an outdated version. The main supported storage backend of Datomic, DynmoDB, is AP only. So consistency is not guaranteed.

We will apply raft globally in the cluster, which means that the cluster size is not unbounded. Since sharding should be something handled on the application level, as the application has more context, the database should not automate sharding. So if unbounded scaling is needed, instead of "just adding more nodes", just run more clusters.

obeah commented 3 years ago

Hi, have you considered FoundationDB as a storage backend to achieve this?

huahaiy commented 3 years ago

We would like to retain the lightweight nature of this library even when we go distributed. So we will probably retain LMDB for local data storage. FoundationDB could be leveraged for cluster coordination. These two working together has been done before, e.g. https://forums.foundationdb.org/t/success-story-foundationdb-at-skuvault/336/4

So we will definitely consider FoundationDB as a possible option for implementing distributed mode.

ieugen commented 3 years ago

FoundationDB seems like a big dependency to add.

How about using a raft clojure/ java library implementation? https://raft.github.io/#implementations

huahaiy commented 3 years ago

The original plan is to use a Java Raft library, but I am open to other possibilities.

huahaiy commented 2 years ago

The existing Raft implementations all have some weaknesses, so I will probably end up implementing my own Raft in Clojure that tailored to the needs of Datalevin. Another option is to borrow some ideas from FoundationalDB.