Clustering capabilities (write master w/ read-only slave followers)

yuer1727 commented 5 years ago

hello valeriansaliou: have you any plan to develop multi nodes version? I mean that highly available and scalable feature is import to be popular and be used in product environment.

valeriansaliou commented 5 years ago

No, not planned.

valeriansaliou commented 5 years ago

Though if you're interested, I'm still open to contributions; if you can work it out in a simple way.

toxuin commented 5 years ago

Here's a simple idea for clustering, correct me if I am missing something:

Implement a "proxy" in front of 2+ nodes of sonic which would do two things:

Maintain channel connections to the nodes, proxying requests from it's own clients. Ingest requests go to EVERY node, Search requests go to a random node (round robin?).
Have a "snapshot" of either the whole index or replay-log on file in order to seed a new cluster node with data when it joins cluster

Bonus badass points for every node being able to act as such proxy elected by consensus algo instead of appointed-once proxy (which would become a single point of failure).

valeriansaliou commented 5 years ago

That would be a pragmatic solution, but I'd be more open to a built-in replication system in Sonic, that may use the Sonic Channel protocol to 'SYNC' between nodes.

With the proxy solution, how would you implement synchronization of 'lost' commands when a given node goes offline for a few seconds? Same as for new nodes, with a full replay from the point it lost synchronization? A replay method would also imply that the proxy stores all executed commands over time, and ideally periodically 'compact' commands when eg. you have a PUSH and then a FLUSHB or FLUSHO that would cancel out the earlier PUSH.

A built-in Sonic clustering protocol could use database 'snapshots' to handle recoveries (I think RocksDB natively supports this), and only re-synchronize the latest part that's been lost on a given node due to downtime. I'll read more on how Redis and others do this as they have native clustering.

valeriansaliou commented 5 years ago

A nice side effect of clustering, is that one could built a large Sonic-based infrastructure, with several "ingestor" nodes and several "search" nodes.

toxuin commented 5 years ago

I like the things you are proposing!

In any shape, clustering for fault-tolerance is a production-ready requirement – at least for us.

May I suggest to re-open this issue, even as an item for a roadmap item?..

valeriansaliou commented 5 years ago

Sure, leaving it open now.

benjamincburns commented 5 years ago

Possibly relevant: Rocksplicator (article, repo).

I don't know how directly useful this would be as-is, but given that it's billed as a set of libraries for clustering RocksDB-backed services, I'd imagine at the very least there are some good lessons learned re: clustering strategy here.

SINHASantos commented 5 years ago

I think The Raft Consensus Algorithm will do great addition to Sonic.

Actix Raft can be integrated with Sonic to add clustering capability

Also Distributed Search idea can be borrowed from Bleve which is used by Couchbase

SINHASantos commented 5 years ago

We see many products built on rocksDB use RAFT successfully. (CoackroachDB, Arangodb etc)

valeriansaliou commented 4 years ago

For simplicity's sake, the replication protocol will be inspired by Redis replication protocol: a primary write master, followed by N read slaves (ie. read-only). If the master falls down, reads are available on slaves but writes are rejected. When the master is recovered, slaves catch up to the master binlog and writes can be resumed by the connected libraries (possibly from a queue).

SINHASantos commented 4 years ago

my two cents , you may consider CRDT model used by Redis replication. 1) Active-Active database replication will ensure Writes are never rejected 2) All the servers are used and you make most of all the servers 3) since there is no transaction and it is only search data -- there is no risk of data loss

Tenzer commented 2 years ago

As a stop-gap workaround for missing clustering functionality, would it be possible to use a distributed file system for synchronizing changes between multiple Sonic instances and then (via a setting, somewhere) configure only one of the Sonic instances to have write access, leaving the other ones to be read-only?

I don't know if it would work to have multiple Sonic instances reading from the same files on disk, and what would happen to one Sonic instance if another instance writes to the files - but if that part works somewhat, it should "only" be a matter of implementing a read-only setting for this to work, I think.

valeriansaliou commented 2 years ago

That would unfortunately not work as RocksDB holds a LOCK file on the file system. Unsure if RocksDB can be started in RO mode, ignoring any LOCK file. Can you provide more reference on that? If it's easily done, then it's a quick-wine.

However, properly-done clustering needs full-data replication, meaning that a second independant server holds a full copy of the data on a different file system. That way, if the master fails, then the slave can be elected as master, and the old master can be resynchronized from the old slave in case of total data loss on one replica.

Tenzer commented 2 years ago

It looks like RocksDB can be opened in read-only mode. There's a page here that specifically talks about this: https://github.com/facebook/rocksdb/wiki/Read-only-and-Secondary-instances.

valeriansaliou commented 2 years ago

Excellent, that can work then.

SINHASantos commented 2 years ago

may be support databases like TiDB ( built on rocks db) and extend the features of TiDB Cluster / replication

charandas commented 1 year ago

Would it be worth it to move this to 1.5.0 milestone? I am open to taking a look at this and sending a PR based on this comment.

valeriansaliou commented 1 year ago

Sure! Totally open to moving this to an earlier milestone. Open to PRs on this, thank you so much!

charandas commented 1 year ago

I looked at the code a bit this evening. As I understand it, all the DB references (whether KV or FST) are opened lazily using the acquire functions in StoreGenericPool which the specific implementations call when there is an executor action requiring the use of the store. Tasker also accesses the stores lazily when doing its tasks.

I am assuming its straight forward enough to utilize the RocksDB secondary replicas as they allow catchup. When I say readonly below, I mean the Sonic replica, not RocksDB readonly replica.

My main question for now is to do with the syncing of the FST store? That doesn't come for free like RocksDB syncing. The executor currently keeps KV and FST stores in sync by performing pushes/pops/other operations on both stores in tandem. Writer will continue to do so, but for readers, should there be an evented system to broadcast all successfully completed write operations to them? On one hand, this approach does sound really complicated to implement and I wonder if constructing the FST by looking up all the keys in the synced KV store is an option on the reader's side?

Anyone else here who has thoughts about this? (I am a newbie 😄 in both Rust and Sonic).

valeriansaliou commented 1 year ago

Hello there,

On the FST thing, as they are guaranteed to be limited in max size per FST, we could think about having a lightweight replication protocol work out insertions/deletions in the FST. If the replication loses sync, we could stream the modified FST files from master disk to slave disk, using checksums for comparison. Note that however, to guarantee integrity this would require to also send the not-yet-consolidated FST operations to the slave, so that next CONSOLIDATE on master and slave result in the same FST file being updated with the same pending operations.

Does that make sense?

valeriansaliou / sonic

Clustering capabilities (write master w/ read-only slave followers) #136