sabledb-io / sabledb

Ultra fast, persistent database supporting Redis API
https://sabledb-io.github.io/sabledb/
BSD 3-Clause "New" or "Revised" License
422 stars 10 forks source link

What's the consistent guarantee in cluster mode? #25

Open crrow opened 1 month ago

sabledb-io commented 1 month ago

I am not sure I understand the question.

  1. Cluster mode is not supported yet
  2. How is cluster relates to consistency?
  3. Do you mean "replication"?
crrow commented 1 month ago

My misunderstood, closed.

crrow commented 1 month ago

I thought it can be used as a distributed redis

sabledb-io commented 1 month ago

I still don't understand the relationship between consistency and cluster. Cluster redis distributes the slots, each node holds different slots. The consistency itself is per "shard" (primary instance + number of replications).

Imagine having a cluster of 6 nodes: 2 primaries and each primary has 2 repliacas.

Primary node#1 manages slots 0-8000 Primary node#2 manages slots 8000-16083 (redis has 0-16K slots)

So if key key_1 is translated into slot#1 -> it is kept in primary node#1 And key key_2 is translated into slot#13000 -> it is kept in primary node#2

So, in Redis world, in a cluster, the primary nodes are never consistent with each other as each primary node holds different set of keys (this is by design)

crrow commented 1 month ago

I still don't understand the relationship between consistency and cluster. Cluster redis distributes the slots, each node holds different slots. The consistency itself is per "shard" (primary instance + number of replications).

So if key key_1 is translated into slot#1 And key key_2 is translated into slot#13000 (out of 16K slots) each key is kept on a different primary node

So, in Redis world, in a cluster, the primary nodes are never consistent with each other as each primary node holds different set of keys (this is by design)

I'm just wondering the read after write case, https://redis.io/docs/latest/operate/oss_and_stack/management/scaling/

sabledb-io commented 1 month ago

@crrow I updated my answer before you replied, I explained there how Redis cluster works in a nutshell. I am sorry of you already know this - I don't know what is your level of knowledge of Redis - so forgive me in advance if this is something you are aware of :)

sabledb-io commented 1 month ago

Unlike "normal" redis (or its replacement Valkey, each command written to SableDb is written to the underlying storage RocksDb which is a disk based storage

SableDb flushes all the RocksDb in-memory cache to disk every (by default every 250ms) - this is configurable and can be reduced all the way to 100ms. i.e. in case of system crash, you might lose all the data written in the last 100ms.

If this is important - we can improve this - by forcing each write to go directly to disk and skip the memory, ofc, it comes with performance penalty

If you have more questions - please don't hesitate to ask here :)

crrow commented 1 month ago

My bad, I didn't describe the question clearly in the first place.

Again, thanks for your answers.

crrow commented 1 month ago

Unlike "normal" redis (or its replacement Valkey, each command written to SableDb is written to the underlying storage RocksDb which is a disk based storage

SableDb flushes all the RocksDb in-memory cache to disk every (by default every 250ms) - this is configurable and can be reduced all the way to 100ms. i.e. in case of system crash, you might lose all the data written in the last 100ms.

If this is important - we can improve this - by forcing each write to go directly to disk and skip the memory, ofc, it comes with performance penalty

If you have more questions - please don't hesitate to ask here :)

RocksDB used WAL to prevent data lost, what is the RocksDB in-memory cache stands for?

BTW, read-after-write consistency isn't a major concern at the moment, as cluster mode isn't yet supported. I just brought it up as a thought.

sabledb-io commented 1 month ago

Regarding cluster ("horizontal scaling"): it is definitely on my roadmap. I will probably complete the SET commands this week and after that I will publish my cluster design (Redis / Valkey are using something called: "cluster bus" where each primary in a cluster is "talking" to another primary to get the cluster status). I believe that this approach has its own problems (e.g. it does not scale well with the number of shards in the cluster).

I decided not to start with the cluster support but rather chosen to invest time in the replication is because SableDb design is much more robust and even when a single shard (1 primary with N replicas) the performance is much better than standard OSS redis)

Also, a small correction to my comment about flushing RocksDb cache tables:

You can disable the manual flushing in SableDb (i.e. let RocksDb do that for you) by setting this configuration entry to false: https://github.com/sabledb-io/sabledb/blob/main/server.ini#L102

With this option set to false, each write is backed with a WAL (Write Ahead Log) so incase of crash, the memory tables can be restored from the WAL

sabledb-io commented 1 month ago

RocksDB used WAL to prevent data lost, what is the RocksDB in-memory cache stands for?

RocksDb is a LSMT (Log Structure Merge Tree) based storage. Each write is done like this:

write -> WAL entry "fwrite" -> write memory table -> return Success

RocksDb allows its users to skip the WAL write to disk until it is needed - this is what we call a "manual WAL flush" I was using the term "cache tables" to simplify things - as I am not aware of your RocksDb knowledge ;) - now I do

sabledb-io commented 1 month ago

You can read more here: DBOptions::manual_wal_flush

crrow commented 1 month ago

Thanks for you answer.