hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.34k stars 4.42k forks source link

Raft election using consul leave #9755

Open ltagliamonte-dd opened 3 years ago

ltagliamonte-dd commented 3 years ago

Hello Hashicorp folks!! I'm using consul v1.7.2, in my setup I have 5 consul-servers. If i gracefully stop one of the consul servers (not the current leader) using consul leave && systemctl stop consul and then i start the process backs up a new leadership election starts.

If i just stop the process with systemctl stop consul a new election isn't triggered.

Is this the correct behavior? It seems wrong to me trigger an election when a non leader server is gracefully stopped.

ltagliamonte-dd commented 3 years ago

tested today with consul 1.8.8 same behavior. this is what i'm doing on a node that is not the leader:

core@ip-10-4-73-219 ~ $  docker exec -it consul-server consul info | grep leader_add
    leader_addr = 10.4.73.198:8300
core@ip-10-4-73-219 ~ $ docker exec consul-server /bin/sh -c 'CONSUL_HTTP_TOKEN=<token> consul leave' && sudo systemctl stop consul-server
Graceful leave complete
core@ip-10-4-73-219 ~ $ sudo systemctl start consul-server
core@ip-10-4-73-219 ~ $  docker exec -it consul-server consul info | grep leader_add
    leader_addr = 10.4.73.190:8300

this is the logs from the leader:

Mar 02 00:19:15 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:15.569Z [INFO]  agent.server.serf.wan: serf: EventMemberLeave: ip-10-4-73-219.us-west-2.compute.internal.us-west-2 10.4.73.219
Mar 02 00:19:15 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:15.570Z [INFO]  agent.server: Handled event for server in area: event=member-leave server=ip-10-4-73-219.us-west-2.compute.internal.us-west-2 area=wan
Mar 02 00:19:19 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:19.453Z [INFO]  agent.server.serf.lan: serf: EventMemberLeave: ip-10-4-73-219.us-west-2.compute.internal 10.4.73.219
Mar 02 00:19:19 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:19.454Z [INFO]  agent.server: Removing LAN server: server="ip-10-4-73-219.us-west-2.compute.internal (Addr: tcp/10.4.73.219:8300) (DC: us-west-2)"
Mar 02 00:19:19 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:19.454Z [INFO]  agent.server: removing server by ID: id=aec26945-b142-2890-3e1e-ce00b18322b8
Mar 02 00:19:19 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:19.454Z [INFO]  agent.server.raft: updating configuration: command=RemoveServer server-id=aec26945-b142-2890-3e1e-ce00b18322b8 server-addr= servers="[{Suffrage:Voter ID:720af5a7-eb42-5a28-c8ac-aeda03a848cd Address:10.4.73.222:8300} {Suffrage:Voter ID:076cfb09-d23f-533e-6dc2-26ae3df40d09 Address:10.4.73.185:8300} {Suffrage:Voter ID:22d2bfef-d7ce-9d11-77fa-cb83b3875ac7 Address:10.4.73.190:8300} {Suffrage:Voter ID:9bda6029-1a4d-f06c-2c48-d97dd85a01d4 Address:10.4.73.198:8300}]"
Mar 02 00:19:19 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:19.455Z [INFO]  agent.server.raft: removed peer, stopping replication: peer=aec26945-b142-2890-3e1e-ce00b18322b8 last-index=398166241
Mar 02 00:19:19 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:19.455Z [INFO]  agent.server.raft: aborting pipeline replication: peer="{Nonvoter aec26945-b142-2890-3e1e-ce00b18322b8 10.4.73.219:8300}"
Mar 02 00:19:19 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:19.458Z [INFO]  agent.server: deregistering member: member=ip-10-4-73-219.us-west-2.compute.internal reason=left
Mar 02 00:19:23 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:23.650Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.4.73.219:8300 leader=10.4.73.198:8300
Mar 02 00:19:27 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:27.249Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=10.4.73.219:8300 leader=10.4.73.198:8300
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.752Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: ip-10-4-73-219.us-west-2.compute.internal 10.4.73.219
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.752Z [INFO]  agent.server: Adding LAN server: server="ip-10-4-73-219.us-west-2.compute.internal (Addr: tcp/10.4.73.219:8300) (DC: us-west-2)"
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.752Z [INFO]  agent.server.raft: updating configuration: command=AddNonvoter server-id=aec26945-b142-2890-3e1e-ce00b18322b8 server-addr=10.4.73.219:8300 servers="[{Suffrage:Voter ID:720af5a7-eb42-5a28-c8ac-aeda03a848cd Address:10.4.73.222:8300} {Suffrage:Voter ID:076cfb09-d23f-533e-6dc2-26ae3df40d09 Address:10.4.73.185:8300} {Suffrage:Voter ID:22d2bfef-d7ce-9d11-77fa-cb83b3875ac7 Address:10.4.73.190:8300} {Suffrage:Voter ID:9bda6029-1a4d-f06c-2c48-d97dd85a01d4 Address:10.4.73.198:8300} {Suffrage:Nonvoter ID:aec26945-b142-2890-3e1e-ce00b18322b8 Address:10.4.73.219:8300}]"
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.754Z [INFO]  agent.server.raft: added peer, starting replication: peer=aec26945-b142-2890-3e1e-ce00b18322b8
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.754Z [ERROR] agent.server.raft: failed to appendEntries to: peer="{Nonvoter aec26945-b142-2890-3e1e-ce00b18322b8 10.4.73.219:8300}" error=EOF
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.756Z [INFO]  agent.server: member joined, marking health alive: member=ip-10-4-73-219.us-west-2.compute.internal
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.769Z [ERROR] agent.server.raft: peer has newer term, stopping replication: peer="{Nonvoter aec26945-b142-2890-3e1e-ce00b18322b8 10.4.73.219:8300}"
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.769Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.4.73.198:8300 [Follower]" leader=
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.769Z [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter 720af5a7-eb42-5a28-c8ac-aeda03a848cd 10.4.73.222:8300}"
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.769Z [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter 076cfb09-d23f-533e-6dc2-26ae3df40d09 10.4.73.185:8300}"
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.769Z [INFO]  agent.server: cluster leadership lost
Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:51.769Z [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter 22d2bfef-d7ce-9d11-77fa-cb83b3875ac7 10.4.73.190:8300}"
Mar 02 00:19:52 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:52.216Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: ip-10-4-73-219.us-west-2.compute.internal.us-west-2 10.4.73.219
Mar 02 00:19:52 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:52.216Z [INFO]  agent.server: Handled event for server in area: event=member-join server=ip-10-4-73-219.us-west-2.compute.internal.us-west-2 area=wan
Mar 02 00:19:52 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:52.890Z [WARN]  agent.server.coordinate: Batch update failed: error="node is not the leader"
Mar 02 00:19:56 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:19:56.723Z [WARN]  agent.server.raft: rejecting vote request since our last index is greater: candidate=10.4.73.219:8300 last-index=398166260 last-candidate-index=398166241
Mar 02 00:20:00 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:20:00.415Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error="node is not the leader"
Mar 02 00:20:00 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:20:00.868Z [INFO]  agent.server: New leader elected: payload=ip-10-4-73-190.us-west-2.compute.internal
Mar 02 00:20:04 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:20:04.868Z [INFO]  agent.server.serf.lan: serf: EventMemberUpdate: ip-10-4-73-219.us-west-2.compute.internal
Mar 02 00:20:04 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:20:04.868Z [INFO]  agent.server: Updating LAN server: server="ip-10-4-73-219.us-west-2.compute.internal (Addr: tcp/10.4.73.219:8300) (DC: us-west-2)"
Mar 02 00:20:09 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:20:09.716Z [INFO]  agent.server.serf.wan: serf: EventMemberUpdate: ip-10-4-73-219.us-west-2.compute.internal.us-west-2
Mar 02 00:20:09 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:20:09.716Z [INFO]  agent.server: Handled event for server in area: event=member-update server=ip-10-4-73-219.us-west-2.compute.internal.us-west-2 area=wan

in the specific:

Mar 02 00:20:00 ip-10-4-73-198.us-west-2.compute.internal docker[3182]:     2021-03-02T00:20:00.868Z [INFO]  agent.server: New leader elected: payload=ip-10-4-73-190.us-west-2.compute.internal

If the cluster has a leader why adding back a node triggers a re-election, it shouldn’t for the sake of the stability of the raft consensus.

ChipV223 commented 3 years ago

Hi @ltagliamonte-dd !

I just tried to repro this in a 3-server node DC(built via Vagrant) using Consul v1.9.5. When I performed consul leave on one of the follower nodes and then brought it back online a minute later, I didn't observe any leadership changes.

If you were to run this on the later-released version of Consul, do you to still observe this behavior?

ltagliamonte-dd commented 3 years ago

@ChipV223 thank you for looking into this, locally i can still see elections happening, those are the step i'm doing:

cat <<EOF > consul-config.json
{
  "telemetry": {
    "disable_hostname": true,
    "prometheus_retention_time": "60s"
  }
}
EOF

CONSUL_VERSION=1.9.5
docker run -d --rm --name=consul-0 -p 8500:8500 -p 8600:8600 -v $(pwd)/consul-config.json:/consul/config/consul-config.json -e CONSUL_BIND_INTERFACE=eth0 consul:${CONSUL_VERSION}
JOIN_IP=$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' consul-0)

for i in $(seq 4)
do
  docker run -d --rm --name=consul-${i} -p 850${i}:8500 -p 860${i}:8600 -v $(pwd)/consul-config.json:/consul/config/consul-config.json -e CONSUL_BIND_INTERFACE=eth0 consul:${CONSUL_VERSION}
done

for i in $(seq 4)
do
  docker exec consul-${i} consul join ${JOIN_IP} 
done

leader_ip=$(docker exec -it consul-0 consul info | grep leader_addr | tr -s " " | cut -d "=" -f2 | cut -d":" -f1)
leader_id=$(docker ps -q | xargs -n 1 docker inspect --format '{{ .NetworkSettings.IPAddress }} {{ .Name }}'| grep ${leader_ip} | sed 's/ \// /' | cut -d" " -f2)
echo ${leader_id}
docker logs -f ${leader_id}

## in another terminal
CONSUL_VERSION=1.9.5
JOIN_IP=$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' consul-0)
docker exec -it consul-1 consul leave
docker run -d --rm --name=consul-1 -p 8501:8500 -p 8601:8600 -v $(pwd)/consul-config.json:/consul/config/consul-config.json  -e CONSUL_BIND_INTERFACE=eth0 consul:${CONSUL_VERSION}
docker exec consul-1 consul join ${JOIN_IP} 

docker exec -it consul-2 consul leave
docker run -d --rm --name=consul-2 -p 8502:8500 -p 8602:8600 -v $(pwd)/consul-config.json:/consul/config/consul-config.json  -e CONSUL_BIND_INTERFACE=eth0 consul:${CONSUL_VERSION}
docker exec consul-2 consul join ${JOIN_IP} 

## check how many elections
curl http://127.0.0.1:8500/v1/agent/metrics?format=prometheus -sS | grep -i consul_raft_state_candidate

#cleanup
docker rm -f  consul-0 consul-1 consul-2 consul-3 consul-4
ChipV223 commented 3 years ago

Thank you for the information @ltagliamonte-dd !

This seems very environment-specific, particularly with docker since I was not able to repro this using Vagrant VMs. I'll try again using docker containers and report back when I get the chance

ltagliamonte commented 3 years ago

@ChipV223 any update on this issue?

ChipV223 commented 3 years ago

Hi @ltagliamonte !

I've not been able to look into this as I've been involved with other tasks. The plan is to try to run my repro again using docker containers sometime this week and will update when I have any news to share

mikemorris commented 3 years ago

@ltagliamonte We're investigating a report of a similar issue and wanted to follow up here to confirm a few details:

ltagliamonte-dd commented 3 years ago

Hello @mikemorris

blake commented 2 years ago

Hi @ltagliamonte-dd, this issue was fixed by hashicorp/raft#476, added to Consul in #11375, and released in 1.10.4.

Thanks for reporting this. I'm going to close this issue now. Feel free to re-open it if for some reason this does not resolve the issue.

ltagliamonte-dd commented 2 years ago

@blake i did test with the steps i posted using 1.10.4 and i can still repro the issue.

dhiaayachi commented 2 years ago

Hey @ltagliamonte, I was able to reproduce the issue with version 1.10.4, seems related to this log line you see: Mar 02 00:19:51 ip-10-4-73-198.us-west-2.compute.internal docker[3182]: 2021-03-02T00:19:51.754Z [ERROR] agent.server.raft: failed to appendEntries to: peer="{Nonvoter aec26945-b142-2890-3e1e-ce00b18322b8 10.4.73.219:8300}" error=EOF. For some reason, the leader fail to append raft logs to the newly rejoined node which trigger an election, I will dig into it more and let you know.