The main consideration here is that we want stream writing throughput to be scalable. With a single raft, throughput will degrade as more streams are added with high writes. If each stream were to have its own raft group, then the raft could be isolated to a few nodes, and other streams with high writes will be moved to other nodes.
The primary difficulty to be addressed with this design is that we will need to implement a distributed transactions system to work with pipelines & the transaction system overall.
We will quite likely need a transaction raft which will be authoritative on the state of transactions, and then follow CockroachDB's design on distributed transactions from there.
the placement driver (PD) should break up streams as they grow. It will use standard metrics to determine the size of a stream's head range in order to determine when it needs to be partitioned.
the PD cluster involves every cluster member. PD are replicated to the entire cluster.
once PD decisions have been made, the PD will monitor the nodes of the cluster to observe their sharding status. Once a sharding operation is safe to proceed to the next step, a new log will be committed to the PD cluster to take the next step of moving data, which will eventually involve nodes removing ranges of data.
In GitLab by @doddzilla on Nov 28, 2020, 14:20
abstract
The main consideration here is that we want stream writing throughput to be scalable. With a single raft, throughput will degrade as more streams are added with high writes. If each stream were to have its own raft group, then the raft could be isolated to a few nodes, and other streams with high writes will be moved to other nodes.
The primary difficulty to be addressed with this design is that we will need to implement a distributed transactions system to work with pipelines & the transaction system overall.
We will quite likely need a transaction raft which will be authoritative on the state of transactions, and then follow CockroachDB's design on distributed transactions from there.
head
range in order to determine when it needs to be partitioned.