Closed sile closed 6 months ago
The OCI build failures are due to the fact that this is an external contribution, so Docker Hub secrets are not injected by Actions.
@illotum - would you mind casting an eye at this?
@sile would you be able to provide reproduction steps that lead to self-election? I'd prefer to try fix it on the election side, right now self-vote is acceptable if it doesn't lead to promotion.
More context:
force_membership_change
at al are not part of the log, so they may arrive before follower reaches its current state (during log replay). Additionally these out-of-band actions can be used by ops. For example we may want to promote witness to leader via manual force_shrink
.
Overall, during the initial implementation, I tried and abandoned approach that relies on followers' knowledge of their state, for this and similar reasons.
@illotum okay, I will share reproduction steps. But, would it be possible for you to review https://github.com/rabbitmq/ra/pull/427 beforehand?
I think that using non_voter
is much easier than promotable
to reproduce the self-voting problem. However, I cannot use non_voter
for reproduction as it seems having another issue subtly relating to this problem.
If I can use non_voter
as expected in #427, the reproduction steps would be as follows:
non_voter
member (named A
) to the clusterra:tansfer_leadership(A, A)
A
becomes the leader BTW, I'm bit confused about the phrase "right now self-vote is acceptable" because the following code in the ra main branch seems to prohibit that: https://github.com/rabbitmq/ra/blob/main/src/ra_server.erl#L1304-L1309
handle_follower(election_timeout,
#{cfg := #cfg{log_id = LogId},
membership := Membership} = State) when Membership =/= voter ->
?DEBUG("~ts: follower ignored election_timeout, non-voter: ~p0",
[LogId, Membership]),
{follower, State, []};
My bad! We don't want non voters to call for election when they are up to date on state, you're correct.
What I meant by acceptable, is that it still happens when the member is not up to date. As I mentioned above, a voter may be demoted later in the log, but won't know it if cluster gets into election mid-sync. We rely on general Raft election mechanism to not accept those candidates.
Regarding your reproduction. In what situation you expect ra:tansfer_leadership(A, A)
to happen without human intent? Membership status was deliberately left malleable to operators and developers, to allow its use in various scenarios.
For example non_voter
is designed for permanent "witnesses" and those may be used as a passive backup. Backups need to be promoted back occasionally.
Or in other words, do you have a repro where system gets into undesirable state on its own?
I will have a look at #427 :)
@illotum I see. Thank you for your explanation.
In what situation you expect
ra:tansfer_leadership(A, A)
to happen without human intent? ... Or in other words, do you have a repro where system gets into undesirable state on its own?
I think that the function will always be called with human intent. So, if it is intended behaviour (by ra developpers) that non_voter
members could be the cluster leader, it's okay to me (the diff for try_become_leader
and force_member_change
should be removed).
If so, however, as I wrote in this PR's description, the self-vote by non-voters should be fixed instead I think. By this self-vote, non-voters always get +1 vote (by themselves) while they are excluded in the quorum. Thus, they can become the leader with fewer votes than what Raft algorithm requires. The following reproduction steps show this scenario (note that we need to use #427).
%% Run two Erlang nodes in separate terminals (foo@localhost is voter, and bar@localhost is non_voter)
$ rebar3 shell --sname foo@localhost
$ rebar3 shell --sname bar@localhost
%% Start ra
foo@localhost> ra:start().
bar@localhost> ra:start().
%% Start a single-member cluster.
foo@localhost> ClusterName = dyn_members.
foo@localhost> Machine = {simple, fun erlang:'+'/2, 0}.
foo@localhost> {ok, _, _} = ra:start_cluster(default, ClusterName, Machine, [{dyn_members, 'foo@localhost'}]).
%% Add a non_voter member
foo@localhost> NonVoterServer = #{id => {dyn_members, 'bar@localhost'}, membership => non_voter}.
foo@localhost> ra:add_member({dyn_members, 'foo@localhost'}, NonVoterServer).
foo@localhost> ra:start_server(default, ClusterName, NonVoterServer, Machine, [{dyn_members,'foo@localhost'}]).
%% Stop foo@localhost node (=> only a non_voter member is running in the cluster)
foo@localhost> q().
%% At this time being, 'bar@localhost' considers 'foo@localhost' is the leader.
bar@localhost> maps:get(leader_id, element(2,ra:member_overview(dyn_members))).
{dyn_members,foo@localhost}
%% Call `gen_statem:cast({dyn_members,bar@localhost},try_become_leader).`
%%
%% [NOTE]
%% This is a bit tricky.
%% But the same scenario could happen in the real world
%% if `ra:transfer_leadership({dyn_members,bar@localhost}, {dyn_members,bar@localhost})` is called
%% when 'foo@localhost' node is alive,
%% and the node aborted just after
%% the leader ra server process called `gen_statem:cast({dyn_members,bar@localhost},try_become_leader)`.
bar@localhost> gen_statem:cast({dyn_members,bar@localhost},try_become_leader).
%% Now, 'bar@localhost' is elected as the leader even if there are no alive voters.
bar@localhost> maps:get(leader_id, element(2,ra:member_overview(dyn_members))).
{dyn_members,bar@localhost}
It was never the intent to make non_voter
leaders. If a replica is not allowed to participate in election, how much sense does it make to allow it to become a leader?
This is definitely not a scenario that this was introduced for in RabbitMQ :)
I can see the argument that a library should prohibit actions leading to bad states. In that case I'd recommend adding a "become_voter" API. My original idea was to complement non_voter
with the ability to both promote and demote voters, to allow runtime control over membership state.
Regarding the repro, this seems to stem from the way we count required quorum. When you run one voter and one non_voter, you effectively run a Raft cluster of one. Raft cannot guarantee anything when you lose that one (voter) member. If there was no witness, you'd already have no data left.
This is a degraded state and by any means is not how Raft should be operated. As soon as you add another voter to your repro, non_voter loses its ability to self elect.
Edit. Just to reiterate. The only reason I can rely on follower's knowledge of cluster state in election_timeout
is that election will not succeed unless follower is up to date on the log. In all other cases, you must also consider situations when a follower is behind and doesn't know its current state.
@michaelklishin, @illotum
Thank you for your comments. However, I am a bit confused about what I should do next to proceed with this PR.
I feel that there is some controversy among reviewers about the best way to address this issue. So it may be better that ra team decides the approach to proceed this PR. (it is okay to close this PR in order to hand it over to you if it is desirable.)
Raft cannot guarantee anything when you lose that one (voter) member. If there was no witness, you'd already have no data left.
I think it is normal that the single voter member will restart without data loss.
As soon as you add another voter to your repro, non_voter loses its ability to self elect.
I might misunderstand something, but even if there is another voter member in the repro (unlike foo@localhost
, this member does not have to be stopped), the non-voter member will still be able to become the leader.
(In this case, bar@localhost
will gain 2 votes (one from self-voting and another from the additional voter), and ra_server:required_quorum()
will return 2.)
@sile leave the change that is commented on as a legitimate improvement, back out the rest. Then I suspect this PR would have a chance of being accepted. Our team's focus is on chaos testing of Ra, so contributions with less-than-obvious value proposition do not receive much attention.
@michaelklishin Thank you for your suggestion (fixed at https://github.com/rabbitmq/ra/pull/435/commits/01afd9b89182b96dcc45aa98d33a830255f20cf5).
Our team's focus is on chaos testing of Ra
Sounds interesting 👍
The RabbitMQ OCI failures are not related to this change: for external contributions, Actions does not inject the necessary secrets.
Merging per discussion with @kjnilsson.
@sile thank you for your ongoing contributions to Ra!
Thank you!
Proposed Changes
In my understanding,
ra
doesn't assume that non-voters become the cluster leader. (If the understanding is not correct, self vote by non-voters should be fixed instead to prevent potential quorum violation.)However, the current implementation sometimes allows non-voters to transition into
pre_vote
state. For instance, a non-voter can become the leader by callingra:transfer_leadership/2
. This PR addresses this problem by adding checks that prevent non-voters from transitioning intopre_vote
state.Types of Changes
What types of changes does your code introduce to this project? Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask on the mailing list. We're here to help! This is simply a reminder of what we are going to look for before merging your code.CONTRIBUTING.md
documentFurther Comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc.