rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.
Other
798 stars 93 forks source link

node stuck in pre_vote state #418

Closed RoadRunnr closed 4 months ago

RoadRunnr commented 4 months ago

Describe the bug

Versions:

I have a ra system where:

The state of node 0 is not changing at all.

when I dump the state of the nodes I see this:

Node 0:

#{membership => voter,last_applied => 13,commit_index => 13,
  commit_latency => 24,
  cluster =>
      #{{ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-0.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'} =>
            #{status => normal,next_index => 1,voter_status => #{},
              query_index => 0,match_index => 0,commit_index_sent => 0},
        {ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-1.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'} =>
            #{status => normal,next_index => 1,query_index => 0,
              match_index => 0,commit_index_sent => 0}},
  current_term => 1,
  voted_for =>
      {ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-0.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'},
  leader_id => undefined,aux_state => undefined,
  cluster_change_permitted => true,
  cluster_index_term => {9,1},
  pending_consistent_queries => [],
  persisted_last_applied => 13,
  queries_waiting_heartbeats => {[],[]},
  query_index => 1,votes => 1,
  pre_vote_token => #Ref<0.364221713.3290693633.220725>}

Node 1:

#{membership => voter,last_applied => 45,commit_index => 45,
  commit_latency => 25,
  cluster =>
      #{{ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-1.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'} =>
            #{status => normal,next_index => 1,query_index => 0,
              match_index => 0,commit_index_sent => 0},
        {ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-2.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'} =>
            #{status => normal,next_index => 46,voter_status => #{},
              query_index => 41097,match_index => 45,
              commit_index_sent => 45}},
  current_term => 1,
  voted_for =>
      {ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-1.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'},
  leader_id =>
      {ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-1.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'},
  aux_state => undefined,cluster_change_permitted => true,
  cluster_index_term => {9,1},
  pending_consistent_queries => [],
  persisted_last_applied => 45,
  queries_waiting_heartbeats => {[],[]},
  query_index => 41097,
  pre_vote_token => #Ref<0.3712892406.1947992065.59241>,
  previous_cluster =>
      {0,0,
       #{{ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-1.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'} =>
             #{status => normal,next_index => 1,query_index => 0,
               match_index => 0,commit_index_sent => 0}}}}

Node 2:

#{membership => voter,last_applied => 45,commit_index => 45,
  commit_latency => 25,
  cluster =>
      #{{ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-1.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'} =>
            #{status => normal,next_index => 1,query_index => 0,
              match_index => 0,commit_index_sent => 0},
        {ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-2.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'} =>
            #{status => normal,next_index => 1,voter_status => #{},
              query_index => 0,match_index => 0,commit_index_sent => 0}},
  current_term => 1,voted_for => undefined,
  leader_id =>
      {ergw_global,'ergw-c-node@smc-pgw-c-4g-pgw-1.smc-pgw-c-4g-pgw.4g-pgw.svc.cluster.local'},
  aux_state => undefined,cluster_change_permitted => true,
  cluster_index_term => {9,1},
  pending_consistent_queries => [],
  persisted_last_applied => 45,
  queries_waiting_heartbeats => {[],[]},
  query_index => 40977}

All 3 nodes have a consitent view for the Erlang node communication

Reproduction steps

I have no idea how to reproduce this.

Expected behavior

Node not stuck at pre_vote

Additional context

No response