rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.
Other
813 stars 96 forks source link

Sunge/handle status callback #361

Closed SimonUnge closed 1 year ago

SimonUnge commented 1 year ago

Very much a DRAFT. @kjnilsson

SimonUnge commented 1 year ago

Example implementation:

handle_status(leader, {ClusterName, _} = Leader, Cluster, _State, Node, nodeup) ->
    %% Figure out if we should add or do nothing                                                                                                                                             
    Conf = make_conf(ClusterName, {ClusterName, Node}, Cluster),
    [{add_member, Conf, {ClusterName, Node}, Cluster}];
handle_status(leader, {ClusterName, _} = Leader, Cluster, _State, Node, nodedown) ->
    %% Figure out if we should remove or do nothing                                                                                                                                          
    [{remove_member, {ClusterName, Node}, Cluster}];
handle_status(leader, _Leader, _Cluster, _State, ServerId, {What, Result}) when What == add_member_result;
                                                                                What == remove_member_result ->
    %% do something with the result...                                                                                                                                                       
    [];
kjnilsson commented 1 year ago

Ok so I have a few thoughts.

I think the ra_machine API should be specific to membership evaluation so something like ra_machine:eval_members or similar. I think it should be called by a timer which by default has a very long interval, say 1hr+. Then we use the nodeup/nodedown handler to shorten the timer interval by some randomised value, (say less that 1 minute) to ensure more timely handling of membership changes when nodes join / leave.

We need to ensure that we don't trigger concurrent membership change tasks (which I believe is possible in the current PR) so we need to monitor the process that is spawned and not evaluate members whilst the task is running. Once the task finishes we can re issue the short timer until the eval members return no changes, then we'll set the long timer again.

We may not want to call ra_machine:eval_members inside the Ra leader process because the code for discovering up nodes in RabbitMQ may be blocking and/or a bit slow. I think for now we can leave it in process.