rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.
Other
813 stars 96 forks source link

Inform queue leaders of cluster node status changes #356

Open SimonUnge opened 1 year ago

SimonUnge commented 1 year ago

Add a new optional callback, or similar, that gets called when a node joins or leaves the Erlang cluster. The callback can take decisions on what to do with this information, such as adding or removing the node as one of its members.

Suggestion: Update ra_server_proc:leader state, that already handles nodeup/nodedown, to call a new optional callback. To cause a randomized delay, perhaps add a erlang:send_after with a new info message, something like (erlang:send_after(SOMERANDOMNUMBER, self(), {delayed_node_status_update, Node, Status}))

and a new clause to leader, something like

leader(info, {delayed_node_status_update, Node, Status}, State0) ->
    Effects = ra_server:NEW_OPT_CALLBACK(State0#state.server_state, Node, Status),
    {State, Actions} = ?HANDLE_EFFECTS(Effects,
                                        cast, State0),
    {keep_state, State, Actions};

Would perhaps also be good to send along the members of the raft, so that the user code does not have to call ra:members()

kjnilsson commented 1 year ago

Yes something like that. I think, however, we can be somewhat more ambitious in the API perhaps.

You are right we should pass the current member configuration along with the call. In fact we may even want to pass the replication state (i.e. last confirmed index) as well so that we have some kind of "freshness" indicator. For example we may not want to auto-grow if one of the members is substantially behind the others.

The call should return a list of modifications and the Ra leader will spawn a transient process to perform these changes in turn (start a new ra server for example and join it to the cluster, then wait for replication to catch up before continuing). Whilst it is performing the modifications this callback will not be called, unless another node change is detected.

The Ra leader can then ensure that any shared configuration is properly consistent across members (something we have to ensure manually ourselves atm).

SimonUnge commented 1 year ago

Got it. So, similar to handle_aux, but more specific. We could re-use the logic of PID monitoring too perhaps, to run one change at a time, in sequential order.

So, perhaps add something like init_nodes_status/1, handle_node_status/N, with a new monitor, [{monitor, process, node_status, Pid}] - realize there are already handle_node_status funs, but some similar name.

SimonUnge commented 1 year ago

@kjnilsson I have a simple prototype working, and using a gen_statem timeout to trigger the handle_status actions. But, I am wondering what is a good way to make sure timeouts happen on the right node. I.e if a leader triggers a delayed timeout, then for some reason becomes a follower before the timeout triggers...

SimonUnge commented 1 year ago

I am currently setting this timeout trigger to all states, and moving the headache to the callback implementation.