yahoo / HaloDB

A fast, log structured key-value store.
https://yahoodevelopers.tumblr.com/post/178250134648/introducing-halodb-a-fast-embedded-key-value
Apache License 2.0
505 stars 99 forks source link

Storage clustering support #18

Closed archenroot closed 4 years ago

archenroot commented 5 years ago

So imagine I will use HaloDB to build some important point within infrastructure, it mean I am interested in running at least 2 instances in time.

Schema 1. I can loadbalance reads, but can push write requests to both instances. Schema 2. I can create one instance as replica over network I might mark reader writer nodes... hm, both approaches sucks and requires lot of work. What about using https://atomix.io/

Any ideas are welcomed.

ahasani commented 5 years ago

By learning from how @amannaly using HaloDB at Oath :

The distributed database which is currently using HaloDB use Kafka as a kind of distributed write-ahead log. Clients write to Kafka and there is a thread in the database which will read from Kafka and then write to HaloDB. Therefore, all writes to HaloDB are from a single thread. Since I was tying to first optimize HaloDB for this particular workload I haven't optimized it for concurrent writes yet. This is a task for the future. You can issue concurrent writes to HaloDB, it doesn't affect the correctness of data, but concurrent writes will block till the write request currently in progress completes. Also note that writes don't block reads.

also :

HaloDB is currently being used in a distributed database that only does single threaded writes to HaloDB. Each database box also has multiple instances of HaloDB running. Clients do concurrent writes to the database, but those writes go to Kafka, which act as a distributed WAL for the database and each box in the cluster reads from Kafka and then writes to a particular HaloDB instance. Since all writes to HaloDB are single threaded I haven't spent any time optimizing the performance of concurrent writes. As I have mentioned in the TODO comment in the put method we can optimize concurrent writes by using more fine grained locking.

Cheers

ahasani commented 5 years ago

@archenroot For true clustering yes, Zookeeper or Atomix would be a good choice. those 2 choices of yours are valid architecture. BTW looking at your requirements have you seen Apache Bookkeeper? might fit your bill

archenroot commented 5 years ago

@ahasani - Ok, nice discussion, thx a lot for detailed input on this. I created from scratch architecture with Kafka as write log history keeper, I think this makes it superior durable, on crash replay, on restart replay and you are done :-) I put picture here #15

archenroot commented 5 years ago

@ahasani - thx for referring to Bookkeeper, wasn't aware of such project... the only thing with zookeeper is that I can implement easily versioning support with HaloDB (I hope so :-), but supporting this in Zookeeper I see more complicated (I don't know the source code, maybe I am wrong)

Q: So this Bookkeeper could be used to sync multiple HaloDB storage instances? Not sure If I get all features it supports...

ahasani commented 5 years ago

@archenroot can we get back to this thread so others can follow in regard to clustering, nice design/drawing though

ahasani commented 5 years ago

Damn you @archenroot for not having an issue of having GBs or TB memory haha. Yes correct its only using memory for Index. Interesting design.

  1. I would not embed kafka though. it must have its own cluster.
  2. HaloDB app would be embedded with kafka client. And a single HaloDB app would run multiple instances of HaloDB (this like sharding)
  3. If you load balance, then you can not separate protocol, each node would be able to accept all protocol REST/Grpc in your case. Please note that in this architecture, write is asynchronous such that write do not get reply/ack of successful db / db cluster write.

Cheers

ahasani commented 5 years ago

Yes you are right, zookeeper / bookkeeper would be an overkill, they exposed pretty low level api. Atomix/HaloDB is easier. if you are okay with your current design you do not need to implement clustering manager (Atomix/Zookeeper) as your design is loosely a master/master

archenroot commented 5 years ago

@ahasani -

  1. sure - cluster - embedded only for local usage and integration testing
  2. yeah, so you get higher write performance (10x instances = 10 write threads), but then you must search across all instances in parallel, which will be super fast anyway, so speed up in general
  3. yeah you are right.

I will try Atomix POC probably, but otherwise I think master/master is fine for this kind of data (I have full copy everywhere at same (similar) time)

I will push here 2 designs of clustering.

archenroot commented 5 years ago

Just add Atomix - it supports same KV data types as HaloDB, so HaloDB will be used only as storage backend (I hope it will be compatible out of box :-) )

Distributed data structures (maps, sets, trees, counters, values, etc)
Distributed communication (direct, publish-subscribe, etc)
Distributed coordination (locks, leader elections, semaphores, barriers, etc)
Group membership

The only issue is that it is already in-memory, so I will need to integrate those 2, so what atomix keeps in its in-memory is exactly the HaloDB index...

archenroot commented 5 years ago

Add Kafka - I see here strong architecture from failure pespective. I can always reload from Kafka and simply overwrite everything... In case of Atomix if failure occurs I identify VALID/CONSISTENT node and replicate from there. So access that node (Atomix has REST and other communication APIs) and ask HaloDB to iterate over complete store and get me results to create new store or overwrite existing in crashed node.

archenroot commented 5 years ago

I am also considering CQEngine - super fast SQL like engine which will require some customization to support KV, I will discuss at their github: https://github.com/npgall/cqengine/issues/218

ahasani commented 5 years ago

Hi @archenroot at :

yeah, so you get higher write performance (10x instances = 10 write threads), but then you must search across all instances in parallel, which will be super fast anyway, so speed up in general

You would not want to search the shards, you would either save metadata or better and a lot faster is to implement Consistent Hash to "choose" the node form the shard.

Your Kafka architecture is cool, it's what many orgs implement as their backbone and as you are using HaloDB for storage, its going to have very good performance.

Cheers

ahasani commented 5 years ago

@archenroot Beside CQEngine, have a look at Apache Calcite, used by many and H2 SQL layer too, this the one used by Apache Iginte. And never forget the venerable Lucene core.

archenroot commented 5 years ago

@ahasani - https://ignite.apache.org/use-cases/database/key-value-store.html

JamieCressey commented 5 years ago

Would a Raft implementation work here? Is replication in scope of this project or should it be developed separately?

wangtao724 commented 4 years ago

HaloDB is designed as single instance DB engine rather than a distributed KV store. It leaves flexibility to customer to decide how to arrange multiple DB engines to setup a distributed KV cluster. Close as invalid