NOTE: all of the following data is outdated, and needs to be updated.
Stream data will always be replicated to all members of the cluster. When a stream is created, it may be created with data sharding enabled or disabled, it will be disabled by default. When sharding is disabled, all data for the stream will be replicated to all members of the cluster.
When sharding is enabled, the cluster Raft leader will carve up the stream into a stable series of chunks starting from the original record on the stream. Once the stream has reached a certain size, a shard will be designated with the start and stop indices of the shard specified. The cluster Raft leader will then begin sending messages to the shard replica nodes to have them balance out the data shards. The most recent data on the stream will remain unsharded until there is a sufficient amount of data to create a new shard. The most recent data on the stream will remain replicated on all stream replica nodes until it is designated for sharding.
The cluster Raft leader is responsible for the placement of stream replicas and nominating which nodes will replicate which streams. The algorith which will be used for making this decsion will quite likely just be based on stream size. Larger streams will be spread out evenly as a simple heuristic for placement.
Replication groups always consist of three nodes. Nodes can be configured with a replication group tag. Nodes with the same tag will form replication groups. Logically, operators should use these tags when provisioning a cluster to have replication groups across different AZs and the like.
In GitLab by @doddzilla on Sep 6, 2019, 11:06
NOTE: all of the following data is outdated, and needs to be updated.
Stream data will always be replicated to all members of the cluster. When a stream is created, it may be created with data sharding enabled or disabled, it will be disabled by default. When sharding is disabled, all data for the stream will be replicated to all members of the cluster.
When sharding is enabled, the cluster Raft leader will carve up the stream into a stable series of chunks starting from the original record on the stream. Once the stream has reached a certain size, a shard will be designated with the start and stop indices of the shard specified. The cluster Raft leader will then begin sending messages to the shard replica nodes to have them balance out the data shards. The most recent data on the stream will remain unsharded until there is a sufficient amount of data to create a new shard. The most recent data on the stream will remain replicated on all stream replica nodes until it is designated for sharding.
The cluster Raft leader is responsible for the placement of stream replicas and nominating which nodes will replicate which streams. The algorith which will be used for making this decsion will quite likely just be based on stream size. Larger streams will be spread out evenly as a simple heuristic for placement.
Replication groups always consist of three nodes. Nodes can be configured with a replication group tag. Nodes with the same tag will form replication groups. Logically, operators should use these tags when provisioning a cluster to have replication groups across different AZs and the like.