Closed bttrfl closed 1 month ago
This can be solved by providing each node a list of root peers in the environment config (flag or env var).
var peers = flag.String("p", `kv1-usc1-a-0,kv1-usc1-c-0,kv1-usc1-f-0`, "Peer nodes")
If the node detects during startup that its hostname is in the list of root peers then it starts itself as a member.
Querying other members for membership info is not an option
If you have a root node list, you can absolutely use an out of band protocol (ie. gRPC) on another port to query membership.
service Internal {
// Members returns json marshaled map of member shard replicaID to hostID
// Used by new hosts joining a bootstrapped shard
rpc Members(MembersRequest) returns (MembersResponse) {}
}
https://github.com/logbn/zongzi/blob/main/grpc_server.go#L49-L54
- Promotes itself to a regular node using its local dragonboat node
If you want root peer membership to be dynamic and support cold restart (from 0 nodes), you'll need a service registry like Consul (or some other source of truth) to keep track of the present root nodes for the cluster. In that scenario, you would update Consul following promotion and retrieve the latest root peer list from the network prior to startup rather than using static environment variables.
Does that answer the question?
Thanks for the reply
If you want root peer membership to be dynamic and support cold restart (from 0 nodes)
Yes, this is what I'm trying to implement. I expect the membership to be fairly stable, but cannot guarantee it. So there may be zero of initial members in the cluster at some point.
you'll need a service registry like Consul (or some other source of truth) to keep track of the present root nodes for the cluster
We actually have such a service, but I'd like the cluster startup not to depend on it. Shouldn't local membership info from the snapshot be enough to start the shard?
I also did some digging around the second issue I described. It happens if the latest index in the snapshot is less then the index of the "ADD OBSERVER" config change entry. So if the new replica recovers from the snapshot and doesn't read log entries directly, it should be fine.
I am trying to implement the following scenario:
ReadIndex
protocolThis approach works well, however I am having trouble handling restarts of the node while it's still joining the cluster and starting such nodes in general.
Dragonboat requires the node to specify it's role at the startup. In case of observer and regular nodes it seems like the only reliable way to identify node's role is to access the logdb:
However, it appears that I cannot access the logdb entries (tried via
GetLogReader
) before the shard starts.ShardInfoList
inNodeHostInfo
is also not available.Is there a workaround for this or an easier way to achieve what I'm trying to do? Thanks in advance