Open crrow opened 1 month ago
My misunderstood, closed.
I thought it can be used as a distributed redis
I still don't understand the relationship between consistency and cluster. Cluster redis distributes the slots, each node holds different slots. The consistency itself is per "shard" (primary instance + number of replications).
Imagine having a cluster of 6 nodes: 2 primaries and each primary has 2 repliacas.
Primary node#1 manages slots 0-8000 Primary node#2 manages slots 8000-16083 (redis has 0-16K slots)
So if key key_1
is translated into slot#1 -> it is kept in primary node#1
And key key_2
is translated into slot#13000 -> it is kept in primary node#2
So, in Redis world, in a cluster, the primary nodes are never consistent with each other as each primary node holds different set of keys (this is by design)
I still don't understand the relationship between consistency and cluster. Cluster redis distributes the slots, each node holds different slots. The consistency itself is per "shard" (primary instance + number of replications).
So if key
key_1
is translated into slot#1 And keykey_2
is translated into slot#13000 (out of 16K slots) each key is kept on a different primary nodeSo, in Redis world, in a cluster, the primary nodes are never consistent with each other as each primary node holds different set of keys (this is by design)
I'm just wondering the read after write case, https://redis.io/docs/latest/operate/oss_and_stack/management/scaling/
@crrow I updated my answer before you replied, I explained there how Redis cluster works in a nutshell. I am sorry of you already know this - I don't know what is your level of knowledge of Redis - so forgive me in advance if this is something you are aware of :)
Unlike "normal" redis (or its replacement Valkey, each command written to SableDb is written to the underlying storage RocksDb
which is a disk based storage
SableDb
flushes all the RocksDb
in-memory cache to disk every (by default every 250ms
) - this is configurable and can be reduced all the way to 100ms. i.e. in case of system crash, you might lose all the data written in the last 100ms.
If this is important - we can improve this - by forcing each write to go directly to disk and skip the memory, ofc, it comes with performance penalty
If you have more questions - please don't hesitate to ask here :)
My bad, I didn't describe the question clearly in the first place.
Again, thanks for your answers.
Unlike "normal" redis (or its replacement Valkey, each command written to SableDb is written to the underlying storage
RocksDb
which is a disk based storage
SableDb
flushes all theRocksDb
in-memory cache to disk every (by default every250ms
) - this is configurable and can be reduced all the way to 100ms. i.e. in case of system crash, you might lose all the data written in the last 100ms.If this is important - we can improve this - by forcing each write to go directly to disk and skip the memory, ofc, it comes with performance penalty
If you have more questions - please don't hesitate to ask here :)
RocksDB used WAL to prevent data lost, what is the RocksDB in-memory cache
stands for?
BTW, read-after-write consistency isn't a major concern at the moment, as cluster mode isn't yet supported. I just brought it up as a thought.
Regarding cluster ("horizontal scaling"): it is definitely on my roadmap. I will probably complete the SET
commands this week and after that I will publish my cluster design (Redis / Valkey are using something called: "cluster bus" where each primary in a cluster is "talking" to another primary to get the cluster status). I believe that this approach has its own problems (e.g. it does not scale well with the number of shards in the cluster).
I decided not to start with the cluster support but rather chosen to invest time in the replication is because SableDb design is much more robust and even when a single shard (1 primary with N replicas) the performance is much better than standard OSS redis)
Also, a small correction to my comment about flushing RocksDb cache tables:
You can disable the manual flushing in SableDb (i.e. let RocksDb do that for you) by setting this configuration entry to false
:
https://github.com/sabledb-io/sabledb/blob/main/server.ini#L102
With this option set to false
, each write is backed with a WAL
(Write Ahead Log) so incase of crash, the memory tables can be restored from the WAL
RocksDB used WAL to prevent data lost, what is the RocksDB in-memory cache stands for?
RocksDb is a LSMT (Log Structure Merge Tree) based storage. Each write is done like this:
write -> WAL entry "fwrite" -> write memory table -> return Success
RocksDb allows its users to skip the WAL write to disk until it is needed - this is what we call a "manual WAL flush" I was using the term "cache tables" to simplify things - as I am not aware of your RocksDb knowledge ;) - now I do
You can read more here: DBOptions::manual_wal_flush
Thanks for you answer.
I am not sure I understand the question.