hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.25k stars 4.41k forks source link

Consul leadership selection in cluster when consul servers are down #7486

Open yevgenyfst opened 4 years ago

yevgenyfst commented 4 years ago

Overview of the Issue

When in cluster of 3 Consul servers 2 servers are down and then 1 goes up (so we have 2 Servers Up), they are not able to select leader of cluster. It worked well in 1.4.4 but when we tried to verify this scenario in 1.7.1 it failed.

Reproduction Steps

Cluster of 3 consul servers (also -bootstrap-expect = 3).

Lets say we have Servers: A (Leader), B , C

Stop A --> Servers: B(Leader), C Stop B --> Servers C Start A --> Servers A,C

These 2 servers didn’t succeed to select leader of the cluster back , since B was last leader and it is down.

In the case that 3) Start B --> Servers B,C and B keeps to be leader.

Operating system and Environment details

Windows. Consul 1.7.1

Is this behavior valid? Should they able to select leader of the cluster in the case of 2 from 3 servers are up?

mkeeler commented 4 years ago

@yevgenyfst At least from your description your cluster should be able to elect a leader. I am going to give a go at reproducing on my end but if you could provide any logs from those servers that could be helpful in tracking down the cause.

mkeeler commented 4 years ago

I just went through what I believe to be the exact same scenario.

  1. Start 3 servers with -bootstrap-expect=3. I also added --retry-join arguments for the other two servers. I will call them A, B and C like you did.
  2. Wait for leader election. It just so happens that A was elected leader.
  3. Killed leader - A.
  4. The remaining two elected B as the new leader.
  5. Killed leader - B
  6. Now no leader could be elected
  7. Restarted A
  8. Now A and C elected C as the leader

In all of the cases I observed the new leader as being elected in the logs of the servers but also with curling the v1/status/leader HTTP API.

At least for me things appear to be operating correctly. Those logs would definitely be helpful in seeing what is going on with your servers.

yevgenyfst commented 4 years ago

Hi I reproduced the scenario Again we have 3 servers ConsultTest-1,ConsultTest-2,ConsultTest-3, Also we have 2 Consul agents on two separate additional machines Each of 5 servers have json configuration where we defined:

"retry_join": [
      "provider=aws tag_key=ConsulServerClusterId tag_value=vpc-...."
  ],

Start: ConsultTest-1(Leader) ,ConsultTest-2,ConsultTest-3

  1. Stop ConsultTest-1
  2. ConsultTest-2 (Elected Leader), ConsultTest-3
  3. Stop ConsultTest-2
  4. Only ConsultTest-3
  5. Start ConsultTest-1 .. Now ConsultTest-1 and ConsultTest-3 are up but no leader

========================================================= Logs:

Log from ConsultTest-1 (step 5 when it was started) There is strange line:

**2020-03-24T17:42:23.070Z [INFO]  agent.server: New leader elected: payload=CONSUL-TEST-2**
**But ConsultTest-2 is down?!**
2020-03-24T17:42:22.994Z [DEBUG] agent.tlsutil: Update: version=1
2020-03-24T17:42:23.002Z [DEBUG] agent.tlsutil: OutgoingRPCWrapper: version=1
2020-03-24T17:42:23.066Z [INFO]  agent.server.raft: initial configuration: index=10809 servers="[{Suffrage:Voter ID:d35b096d-d361-e4d3-5b5e-8c0ab8ba44db Address:10.98.2.16:8300} {Suffrage:Voter ID:69062c45-1382-44a3-290b-0d821f502181 Address:10.98.1.245:8300} {Suffrage:Voter ID:e6301768-c541-2f03-8eaf-2e25e1e5af53 Address:10.98.0.9:8300}]"
2020-03-24T17:42:23.066Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.98.0.9:8300 [Follower]" leader=
2020-03-24T17:42:23.066Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: CONSUL-TEST-1.dc1 10.98.0.9
2020-03-24T17:42:23.067Z [INFO]  agent.server.serf.wan: serf: Attempting re-join to previously known node: CONSUL-TEST-3.dc1: 10.98.2.16:8302
2020-03-24T17:42:23.067Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: CONSUL-TEST-1 10.98.0.9
2020-03-24T17:42:23.067Z [INFO]  agent.server.serf.lan: serf: Attempting re-join to previously known node: SCM-HATEST-PASSIVE: 10.98.1.156:8301
2020-03-24T17:42:23.067Z [INFO]  agent.server: Adding LAN server: server="CONSUL-TEST-1 (Addr: tcp/10.98.0.9:8300) (DC: dc1)"
2020-03-24T17:42:23.067Z [INFO]  agent.server: Raft data found, disabling bootstrap mode
2020-03-24T17:42:23.067Z [INFO]  agent.server: Handled event for server in area: event=member-join server=CONSUL-TEST-1.dc1 area=wan
2020-03-24T17:42:23.067Z [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
2020-03-24T17:42:23.067Z [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2020-03-24T17:42:23.068Z [DEBUG] agent.server.memberlist.wan: memberlist: Initiating push/pull sync with: 10.98.2.16:8302
2020-03-24T17:42:23.068Z [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 10.98.1.156:8301
2020-03-24T17:42:23.068Z [INFO]  agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
2020-03-24T17:42:23.068Z [INFO]  agent: started state syncer
2020-03-24T17:42:23.069Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2020-03-24T17:42:23.069Z [INFO]  agent: Joining cluster...: cluster=LAN
2020-03-24T17:42:23.069Z [DEBUG] agent: discover: Using provider "aws": cluster=LAN
2020-03-24T17:42:23.069Z [INFO]  agent: discover-aws: Address type  is not supported. Valid values are {private_v4,public_v4,public_v6}. Falling back to 'private_v4': cluster=LAN
2020-03-24T17:42:23.069Z [INFO]  agent: discover-aws: Region not provided. Looking up region in metadata...: cluster=LAN
2020-03-24T17:42:23.070Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: SCM-HATEST-PASSIVE 10.98.1.156
2020-03-24T17:42:23.070Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: CONSUL-TEST-3 10.98.2.16
2020-03-24T17:42:23.070Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: SCM-HATEST-MAIN 10.98.0.195
2020-03-24T17:42:23.070Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: CONSUL-TEST-3.dc1 10.98.2.16
2020-03-24T17:42:23.070Z [DEBUG] agent.server.serf.wan: serf: Refuting an older leave intent
2020-03-24T17:42:23.070Z [DEBUG] agent.server.serf.lan: serf: Refuting an older leave intent
2020-03-24T17:42:23.070Z [INFO]  agent.server.serf.wan: serf: Re-joined to previously known node: CONSUL-TEST-3.dc1: 10.98.2.16:8302
2020-03-24T17:42:23.070Z [INFO]  agent.server.serf.lan: serf: Re-joined to previously known node: SCM-HATEST-PASSIVE: 10.98.1.156:8301
2020-03-24T17:42:23.070Z [INFO]  agent.server: Adding LAN server: server="CONSUL-TEST-3 (Addr: tcp/10.98.2.16:8300) (DC: dc1)"
2020-03-24T17:42:23.070Z [INFO]  agent.server: New leader elected: payload=CONSUL-TEST-2
2020-03-24T17:42:23.070Z [INFO]  agent.server: Handled event for server in area: event=member-join server=CONSUL-TEST-3.dc1 area=wan
2020-03-24T17:42:23.072Z [INFO]  agent: discover-aws: Region is eu-west-1: cluster=LAN
2020-03-24T17:42:23.072Z [DEBUG] agent: discover-aws: Creating session...: cluster=LAN
2020-03-24T17:42:23.073Z [INFO]  agent: discover-aws: Filter instances with ConsulServerClusterId=vpc-0245170aaae5b3a96: cluster=LAN
2020-03-24T17:42:23.219Z [DEBUG] agent: discover-aws: Found 3 reservations: cluster=LAN
2020-03-24T17:42:23.219Z [DEBUG] agent: discover-aws: Reservation r-02581cbda62248c07 has 1 instances: cluster=LAN
2020-03-24T17:42:23.219Z [DEBUG] agent: discover-aws: Found instance i-054fd3e182e6408c9: cluster=LAN
2020-03-24T17:42:23.219Z [INFO]  agent: discover-aws: Instance i-054fd3e182e6408c9 has private ip 10.98.0.9: cluster=LAN
2020-03-24T17:42:23.219Z [DEBUG] agent: discover-aws: Reservation r-0aec36c7428a3ec90 has 1 instances: cluster=LAN
2020-03-24T17:42:23.219Z [DEBUG] agent: discover-aws: Found instance i-00be681e238bfcaad: cluster=LAN
2020-03-24T17:42:23.219Z [INFO]  agent: discover-aws: Instance i-00be681e238bfcaad has private ip 10.98.1.245: cluster=LAN
2020-03-24T17:42:23.219Z [DEBUG] agent: discover-aws: Reservation r-099508f690e9659ef has 1 instances: cluster=LAN
2020-03-24T17:42:23.219Z [DEBUG] agent: discover-aws: Found instance i-0f00e66a72159af9a: cluster=LAN
2020-03-24T17:42:23.219Z [INFO]  agent: discover-aws: Instance i-0f00e66a72159af9a has private ip 10.98.2.16: cluster=LAN
2020-03-24T17:42:23.219Z [DEBUG] agent: discover-aws: Found ip addresses: [10.98.0.9 10.98.1.245 10.98.2.16]: cluster=LAN
2020-03-24T17:42:23.219Z [INFO]  agent: Discovered servers: cluster=LAN cluster=LAN servers="10.98.0.9 10.98.1.245 10.98.2.16"
2020-03-24T17:42:23.219Z [INFO]  agent: (LAN) joining: lan_addresses=[10.98.0.9, 10.98.1.245, 10.98.2.16]
2020-03-24T17:42:23.220Z [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 10.98.0.9:8301
2020-03-24T17:42:23.220Z [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=10.98.0.9:61844
2020-03-24T17:42:23.356Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:23.384Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:23.414Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:23.556Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:23.621Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:23.693Z [DEBUG] agent.server.serf.wan: serf: messageJoinType: CONSUL-TEST-1.dc1
2020-03-24T17:42:24.181Z [DEBUG] agent.server.serf.wan: serf: messageJoinType: CONSUL-TEST-1.dc1
2020-03-24T17:42:24.227Z [DEBUG] agent.server.memberlist.lan: memberlist: Failed to join 10.98.1.245: dial tcp 10.98.1.245:8301: connectex: No connection could be made because the target machine actively refused it.
2020-03-24T17:42:24.227Z [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 10.98.2.16:8301
2020-03-24T17:42:24.228Z [INFO]  agent: (LAN) joined: number_of_nodes=2
2020-03-24T17:42:24.228Z [DEBUG] agent: systemd notify failed: error="No socket"
2020-03-24T17:42:24.228Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=2
2020-03-24T17:42:24.352Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:24.420Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:24.422Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:24.556Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:27.008Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-24T17:42:27.008Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.0.9:8300 [Candidate]" term=300
2020-03-24T17:42:27.011Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:27.016Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:27.016Z [DEBUG] agent.server.raft: newer term discovered, fallback to follower
2020-03-24T17:42:27.018Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.98.0.9:8300 [Follower]" leader=
2020-03-24T17:42:28.024Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.0.9:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:29.722Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-24T17:42:29.722Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.0.9:8300 [Candidate]" term=323
2020-03-24T17:42:29.724Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:29.729Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:29.729Z [DEBUG] agent.server.raft: vote granted: from=e6301768-c541-2f03-8eaf-2e25e1e5af53 term=323 tally=1
2020-03-24T17:42:30.293Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"

Log from ConsulTest-3

2020-03-24T17:42:04.815Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:04.815Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=315
2020-03-24T17:42:04.823Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:04.823Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=315 tally=1
2020-03-24T17:42:04.823Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:05.821Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:06.290Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2020-03-24T17:42:08.811Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:08.811Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=316
2020-03-24T17:42:08.818Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:08.819Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=316 tally=1
2020-03-24T17:42:08.819Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:09.817Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:12.237Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:12.237Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=317
2020-03-24T17:42:12.244Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:12.244Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:12.244Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=317 tally=1
2020-03-24T17:42:13.274Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:14.321Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:14.321Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=318
2020-03-24T17:42:14.328Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:14.328Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:14.328Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=318 tally=1
2020-03-24T17:42:15.352Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:17.228Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:17.228Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=319
2020-03-24T17:42:17.235Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:17.235Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:17.235Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=319 tally=1
2020-03-24T17:42:18.208Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to CONSUL-TEST-2.dc1 10.98.1.245:8302
2020-03-24T17:42:18.210Z [DEBUG] agent.server.serf.lan: serf: forgoing reconnect for random throttling
2020-03-24T17:42:18.243Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:19.208Z [DEBUG] agent.server.memberlist.wan: memberlist: Failed to join 10.98.1.245: dial tcp 10.98.1.245:8302: connectex: No connection could be made because the target machine actively refused it.
2020-03-24T17:42:19.524Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:19.524Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=320
2020-03-24T17:42:19.532Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:19.532Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:19.532Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=320 tally=1
2020-03-24T17:42:20.539Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:20.676Z [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=10.98.0.195:59635
2020-03-24T17:42:23.071Z [DEBUG] agent.server.memberlist.wan: memberlist: Stream connection from=10.98.0.9:61836
2020-03-24T17:42:23.072Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: CONSUL-TEST-1.dc1 10.98.0.9
2020-03-24T17:42:23.072Z [INFO]  agent.server: Handled event for server in area: event=member-join server=CONSUL-TEST-1.dc1 area=wan
2020-03-24T17:42:23.196Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:23.196Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=321
2020-03-24T17:42:23.197Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: CONSUL-TEST-1 10.98.0.9
2020-03-24T17:42:23.197Z [INFO]  agent.server: Adding LAN server: server="CONSUL-TEST-1 (Addr: tcp/10.98.0.9:8300) (DC: dc1)"
2020-03-24T17:42:23.203Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:23.203Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:23.203Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=321 tally=1
2020-03-24T17:42:23.271Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:23.359Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:23.423Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:23.572Z [DEBUG] agent.server.serf.wan: serf: messageJoinType: CONSUL-TEST-1.dc1
2020-03-24T17:42:24.071Z [DEBUG] agent.server.serf.wan: serf: messageJoinType: CONSUL-TEST-1.dc1
2020-03-24T17:42:24.206Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:24.231Z [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=10.98.0.9:61846
2020-03-24T17:42:24.271Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:24.423Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:24.560Z [DEBUG] agent.server.serf.lan: serf: messageJoinType: CONSUL-TEST-1
2020-03-24T17:42:24.572Z [DEBUG] agent.server.serf.wan: serf: messageJoinType: CONSUL-TEST-1.dc1
2020-03-24T17:42:25.072Z [DEBUG] agent.server.serf.wan: serf: messageJoinType: CONSUL-TEST-1.dc1
2020-03-24T17:42:26.680Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:26.680Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=322
2020-03-24T17:42:26.687Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:26.687Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:26.687Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=322 tally=1
2020-03-24T17:42:27.016Z [DEBUG] agent.server.raft: accepted connection: local-address=10.98.2.16:8300 remote-address=10.98.0.9:61848
2020-03-24T17:42:27.697Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:29.727Z [DEBUG] agent.server.raft: lost leadership because received a requestVote with a newer term
2020-03-24T17:42:29.730Z [WARN]  agent.server.raft: rejecting vote request since our last term is greater: candidate=10.98.0.9:8300 last-term=301 last-candidate-term=299
2020-03-24T17:42:29.730Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.98.2.16:8300 [Follower]" leader=
2020-03-24T17:42:31.795Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-24T17:42:31.795Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=324
2020-03-24T17:42:31.803Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:31.803Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:31.803Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=324 tally=1
2020-03-24T17:42:32.774Z [INFO]  agent.server.raft: duplicate requestVote for same term: term=324
2020-03-24T17:42:32.805Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:34.436Z [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 10.98.1.156:8301
2020-03-24T17:42:34.998Z [DEBUG] agent.server.raft: lost leadership because received a requestVote with a newer term
2020-03-24T17:42:35.001Z [WARN]  agent.server.raft: rejecting vote request since our last term is greater: candidate=10.98.0.9:8300 last-term=301 last-candidate-term=299
2020-03-24T17:42:35.001Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.98.2.16:8300 [Follower]" leader=
2020-03-24T17:42:35.993Z [DEBUG] agent.server.memberlist.wan: memberlist: Initiating push/pull sync with: 10.98.0.9:8302
2020-03-24T17:42:36.278Z [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=10.98.1.156:51157
2020-03-24T17:42:37.413Z [DEBUG] agent.server.raft: lost leadership because received a requestVote with a newer term
2020-03-24T17:42:37.416Z [WARN]  agent.server.raft: rejecting vote request since our last term is greater: candidate=10.98.0.9:8300 last-term=301 last-candidate-term=299
2020-03-24T17:42:37.435Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-24T17:42:37.435Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=327
2020-03-24T17:42:37.441Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:37.441Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:37.441Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=327 tally=1
2020-03-24T17:42:38.442Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:41.083Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2020-03-24T17:42:41.188Z [WARN]  agent.server.raft: Election timeout reached, restarting election
2020-03-24T17:42:41.188Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=328
2020-03-24T17:42:41.196Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:41.196Z [DEBUG] agent.server.raft: vote granted: from=d35b096d-d361-e4d3-5b5e-8c0ab8ba44db term=328 tally=1
2020-03-24T17:42:41.196Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"
2020-03-24T17:42:42.211Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 69062c45-1382-44a3-290b-0d821f502181 10.98.1.245:8300}" error="dial tcp 10.98.2.16:0->10.98.1.245:8300: connectex: No connection could be made because the target machine actively refused it."
2020-03-24T17:42:43.899Z [DEBUG] agent.server.raft: lost leadership because received a requestVote with a newer term
2020-03-24T17:42:43.916Z [WARN]  agent.server.raft: rejecting vote request since our last term is greater: candidate=10.98.0.9:8300 last-term=301 last-candidate-term=299
2020-03-24T17:42:43.916Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.98.2.16:8300 [Follower]" leader=
2020-03-24T17:42:46.027Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-24T17:42:46.027Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.98.2.16:8300 [Candidate]" term=330
2020-03-24T17:42:46.033Z [DEBUG] agent.server.raft: votes: needed=2
2020-03-24T17:42:46.033Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=69062c45-1382-44a3-290b-0d821f502181 fallback=10.98.1.245:8300 error="Could not find address for server id 69062c45-1382-44a3-290b-0d821f502181"