Consul server in a multiple server datacenter causes unexpected behaviour after changing hostname and gets restarted

viniciusartur commented 5 years ago

Overview of the Issue

When a follower gets restarted after changing the hostname it causes many errors messages in the leader log and its own log along the time and when there are new elections.

BDD description:
Given a boostrapped datacenter of 5 Consul servers
When one of the followers change the hostname
And gets restarted
Then both leader and that follower start misbehaving

Reproduction Steps

Steps to reproduce this issue:

Create 5 servers using Docker

docker run -d --name consul1 --cap-add SYS_ADMIN --network consul_net consul sleep 999999
...
docker run -d --name consul5 --cap-add SYS_ADMIN --network consul_net consul sleep 999999

Bootstrap the DC (do it for each container)

docker exec -it consul1 sh
consul agent -server -data-dir=/tmp/test -bootstrap-expect=5 -retry-join=consul1 -retry-join=consul2 -retry-join=consul3 -retry-join=consul4 -retry-join=consul5
...
docker exec -it consul5 sh
consul agent -server -data-dir=/tmp/test -bootstrap-expect=5 -retry-join=consul1 -retry-join=consul2 -retry-join=consul3 -retry-join=consul4 -retry-join=consul5

Find one follower (run in any container)

docker exec -it consul 1 sh
consul operator raft list-peers

Change hostname of the follower

docker exec -it consul4 sh
hostname consul4

Restart consul in that follower

docker exec -it consul4 sh
kill `pidof consul`; consul agent -server -data-dir=/tmp/test -bootstrap-expect=5 -retry-join=consul1 -retry-join=consul2 -retry-join=consul3 -retry-join=consul4 -retry-join=consul5
#or
consul leave; consul agent -server -data-dir=/tmp/test -bootstrap-expect=5 -retry-join=consul1 -retry-join=consul2 -retry-join=consul3 -retry-join=consul4 -retry-join=consul5
#or
consul operator raft remove-peer -address=172.19.0.3:8300; consul leave; consul leave; consul agent -server -data-dir=/tmp/test -bootstrap-expect=5 -retry-join=consul1 -retry-join=consul2 -retry-join=consul3 -retry-join=consul4 -retry-join=consul5

Raft peers flapping

If you watch the list of peers in each node you can see that follower will flap between the old name and the name

docker exec -it consul${I} sh
watch -n1 consul operator raft list-peers

it will show one of this 2 outputs:

Node          ID                                    Address          State     Voter  RaftProtocol
42ea2b3e105c  3b02d4c8-e506-bac6-016e-65b910410d1b  172.19.0.2:8300  leader    true   3
0941995127df  471317bd-616e-f769-29b2-8aad6e3ed683  172.19.0.3:8300  follower  true   3
0b1a425682a9  90ad140e-1105-ad55-c2d7-6c2cd405913e  172.19.0.4:8300  follower  true   3
168162d91743  aa9c1a1f-b981-5a90-2ea6-c5b33a6287b9  172.19.0.6:8300  follower  true   3
51de0e48625d  7bbc6c78-927d-1191-0ff1-c6010d0e78d8  172.19.0.5:8300  follower  true   3

or

Node          ID                                    Address          State     Voter  RaftProtocol
42ea2b3e105c  3b02d4c8-e506-bac6-016e-65b910410d1b  172.19.0.2:8300  leader    true   3
0941995127df  471317bd-616e-f769-29b2-8aad6e3ed683  172.19.0.3:8300  follower  true   3
0b1a425682a9  90ad140e-1105-ad55-c2d7-6c2cd405913e  172.19.0.4:8300  follower  true   3
168162d91743  aa9c1a1f-b981-5a90-2ea6-c5b33a6287b9  172.19.0.6:8300  follower  true   3
consul4       7bbc6c78-927d-1191-0ff1-c6010d0e78d8  172.19.0.5:8300  follower  true   3

Log Fragments

The server starts throwing many error logs similar to this one:

[WARN] raft: Unable to get address for server id 7bbc6c78-927d-1191-0ff1-c6010d0e78d8, using fallback address 172.19.0.5:8300: Could not find address for server id 7bbc6c78-927d-1191-0ff1-c6010d0e78d8

From time to time both leader and the changed hostname follower throw error messages like this: Leader:

2019/08/23 09:52:29 [WARN] raft: Rejecting vote request from 172.19.0.5:8300 since we have a leader: 172.19.0.2:8300
    2019/08/23 09:52:37 [WARN] raft: Rejecting vote request from 172.19.0.5:8300 since we have a leader: 172.19.0.2:8300
    2019/08/23 09:52:39 [INFO] raft: Updating configuration with AddNonvoter (7bbc6c78-927d-1191-0ff1-c6010d0e78d8, 172.19.0.5:8300) to [{Suffrage:Voter ID:3b02d4c8-e506-bac6-016e-65b910410d1b Address:172.19.0.2:8300} {Suffrage:Voter ID:471317bd-616e-f769-29b2-8aad6e3ed683 Address:172.19.0.3:8300} {Suffrage:Voter ID:90ad140e-1105-ad55-c2d7-6c2cd405913e Address:172.19.0.4:8300} {Suffrage:Voter ID:aa9c1a1f-b981-5a90-2ea6-c5b33a6287b9 Address:172.19.0.6:8300} {Suffrage:Nonvoter ID:7bbc6c78-927d-1191-0ff1-c6010d0e78d8 Address:172.19.0.5:8300}]
    2019/08/23 09:52:39 [INFO] raft: Added peer 7bbc6c78-927d-1191-0ff1-c6010d0e78d8, starting replication
    2019/08/23 09:52:39 [WARN] raft: Unable to get address for server id 7bbc6c78-927d-1191-0ff1-c6010d0e78d8, using fallback address 172.19.0.5:8300: Could not find address for server id 7bbc6c78-927d-1191-0ff1-c6010d0e78d8
    2019/08/23 09:52:39 [ERR] raft: peer {Nonvoter 7bbc6c78-927d-1191-0ff1-c6010d0e78d8 172.19.0.5:8300} has newer term, stopping replication
    2019/08/23 09:52:39 [INFO] raft: Node at 172.19.0.2:8300 [Follower] entering Follower state (Leader: "")
    2019/08/23 09:52:39 [INFO] raft: aborting pipeline replication to peer {Voter 471317bd-616e-f769-29b2-8aad6e3ed683 172.19.0.3:8300}
    2019/08/23 09:52:39 [ERR] consul: failed to add raft peer: leadership lost while committing log
    2019/08/23 09:52:39 [ERR] consul: failed to reconcile member: {consul4 172.19.0.5 8301 map[acls:0 build:1.5.1:40cec984 dc:dc1 expect:5 id:7bbc6c78-927d-1191-0ff1-c6010d0e78d8 port:8300 raft_vsn:3 role:consul segment: vsn:2 vsn_max:3 vsn_min:2 wan_join_port:8302] alive 1 5 2 2 5 4}: leadership lost while committing log
    2019/08/23 09:52:39 [INFO] raft: aborting pipeline replication to peer {Voter 90ad140e-1105-ad55-c2d7-6c2cd405913e 172.19.0.4:8300}
    2019/08/23 09:52:39 [INFO] consul: cluster leadership lost
2019/08/23 09:52:39 [INFO] raft: aborting pipeline replication to peer {Voter aa9c1a1f-b981-5a90-2ea6-c5b33a6287b9 172.19.0.6:8300}
    2019/08/23 09:52:46 [WARN] raft: Rejecting vote request from 172.19.0.5:8300 since our last index is greater (289, 281)
    2019/08/23 09:52:47 [ERR] http: Request GET /v1/operator/raft/configuration, error: No cluster leader from=127.0.0.1:41322
    2019/08/23 09:52:49 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2019/08/23 09:52:49 [INFO] raft: Node at 172.19.0.2:8300 [Candidate] entering Candidate state in term 65
    2019/08/23 09:52:49 [INFO] raft: Election won. Tally: 3
    2019/08/23 09:52:49 [INFO] raft: Node at 172.19.0.2:8300 [Leader] entering Leader state
    2019/08/23 09:52:49 [INFO] raft: Added peer 471317bd-616e-f769-29b2-8aad6e3ed683, starting replication
    2019/08/23 09:52:49 [INFO] raft: Added peer 90ad140e-1105-ad55-c2d7-6c2cd405913e, starting replication
    2019/08/23 09:52:49 [INFO] raft: Added peer aa9c1a1f-b981-5a90-2ea6-c5b33a6287b9, starting replication
    2019/08/23 09:52:49 [INFO] consul: cluster leadership acquired
    2019/08/23 09:52:49 [INFO] raft: Added peer 7bbc6c78-927d-1191-0ff1-c6010d0e78d8, starting replication
    2019/08/23 09:52:49 [WARN] raft: Unable to get address for server id 7bbc6c78-927d-1191-0ff1-c6010d0e78d8, using fallback address 172.19.0.5:8300: Could not find address for server id 7bbc6c78-927d-1191-0ff1-c6010d0e78d8
    2019/08/23 09:52:49 [INFO] consul: New leader elected: 42ea2b3e105c
    2019/08/23 09:52:49 [INFO] raft: pipelining replication to peer {Voter aa9c1a1f-b981-5a90-2ea6-c5b33a6287b9 172.19.0.6:8300}
    2019/08/23 09:52:49 [INFO] raft: pipelining replication to peer {Voter 471317bd-616e-f769-29b2-8aad6e3ed683 172.19.0.3:8300}
    2019/08/23 09:52:49 [INFO] raft: pipelining replication to peer {Voter 90ad140e-1105-ad55-c2d7-6c2cd405913e 172.19.0.4:8300}
    2019/08/23 09:52:49 [WARN] raft: AppendEntries to {Nonvoter 7bbc6c78-927d-1191-0ff1-c6010d0e78d8 172.19.0.5:8300} rejected, sending older logs (next: 282)
    2019/08/23 09:52:49 [WARN] raft: Unable to get address for server id 7bbc6c78-927d-1191-0ff1-c6010d0e78d8, using fallback address 172.19.0.5:8300: Could not find address for server id 7bbc6c78-927d-1191-0ff1-c6010d0e78d8

The follower:

2019/08/23 09:38:14 [INFO] raft: Node at 172.19.0.5:8300 [Candidate] entering Candidate state in term 24
    2019/08/23 09:38:17 [ERR] http: Request GET /v1/operator/raft/configuration, error: No cluster leader from=127.0.0.1:33972
    2019/08/23 09:38:20 [WARN] raft: Election timeout reached, restarting election
    2019/08/23 09:38:20 [INFO] raft: Node at 172.19.0.5:8300 [Candidate] entering Candidate state in term 25
    2019/08/23 09:38:20 [WARN] raft: Failed to get previous log: 115 log not found (last: 107)
    2019/08/23 09:38:20 [INFO] raft: Node at 172.19.0.5:8300 [Follower] entering Follower state (Leader: "172.19.0.2:8300")
    2019/08/23 09:38:20 [ERR] raft-net: Failed to flush response: write tcp 172.19.0.5:8300->172.19.0.2:55180: write: broken pipe
    2019/08/23 09:38:21 [INFO] consul: New leader elected: 42ea2b3e105c
    2019/08/23 09:38:27 [INFO] serf: attempting reconnect to 51de0e48625d.dc1 172.19.0.5:8302
    2019/08/23 09:38:29 [WARN] raft: Heartbeat timeout from "172.19.0.2:8300" reached, starting election
...
2019/08/23 09:43:19 [INFO] raft: Node at 172.19.0.5:8300 [Candidate] entering Candidate state in term 42
    2019/08/23 09:43:20 [ERR] http: Request GET /v1/operator/raft/configuration, error: No cluster leader from=127.0.0.1:36772
    2019/08/23 09:43:21 [ERR] agent: failed to sync remote state: No cluster leader
    2019/08/23 09:43:24 [WARN] raft: Election timeout reached, restarting election
    2019/08/23 09:43:24 [INFO] raft: Node at 172.19.0.5:8300 [Candidate] entering Candidate state in term 43
    2019/08/23 09:43:28 [ERR] agent: Coordinate update error: No cluster leader
    2019/08/23 09:43:29 [ERR] http: Request GET /v1/operator/raft/configuration, error: No cluster leader from=127.0.0.1:36846
    2019/08/23 09:43:31 [WARN] raft: Election timeout reached, restarting election
    2019/08/23 09:43:31 [INFO] raft: Node at 172.19.0.5:8300 [Candidate] entering Candidate state in term 44
    2019/08/23 09:43:34 [WARN] raft: Failed to get previous log: 175 log not found (last: 165)
    2019/08/23 09:43:34 [INFO] raft: Node at 172.19.0.5:8300 [Follower] entering Follower state (Leader: "172.19.0.2:8300")
    2019/08/23 09:43:34 [ERR] raft-net: Failed to flush response: write tcp 172.19.0.5:8300->172.19.0.2:58056: write: broken pipe
    2019/08/23 09:43:34 [WARN] raft: Failed to get previous log: 179 log not found (last: 178)
    2019/08/23 09:43:34 [INFO] consul: New leader elected: 42ea2b3e105c

hanshasselberg commented 5 years ago

Thanks for reporting @viniciusartur! Without having checked the code I think the problem is that Consul defaults to the hostname for the NodeName. This in turn is being used in Serf. What you describe might be a bug, I am not sure yet. Maybe you can workaround that by explicitly providing a node name with -node.

dineshba commented 5 years ago

I tried to debug this bug. Here are my findings, may be helpful.

As @viniciusartur mentioned, consul operator raft list-peers shows different output each time. Things I noted: 1) NodeName is changing 2) ID is same 3) Voter is false (follower is true)

And in leader logs, the above mentioned follower is keep on removing and adding.

  2019/09/07 18:53:13 [INFO] consul: removing server by ID: "9ba93dec-371e-a43a-fdf0-d3c97685adc9"
    2019/09/07 18:53:13 [INFO]  raft: Updating configuration with RemoveServer (9ba93dec-371e-a43a-fdf0-d3c97685adc9, ) to [{Suffrage:Voter ID:d7604537-5ad7-5cc9-8deb-34851162e7b5 Address:172.18.0.2:8300} {Suffrage:Voter ID:596724a9-7f24-cdf8-3f66-b0c72d6a006d Address:172.18.0.3:8300} {Suffrage:Voter ID:2207987c-22aa-054e-5ec5-6f934e76ed64 Address:172.18.0.4:8300} {Suffrage:Voter ID:de91d962-ba0b-8b12-4b3e-5469f683b0ae Address:172.18.0.6:8300}]
    2019/09/07 18:53:13 [INFO]  raft: Removed peer 9ba93dec-371e-a43a-fdf0-d3c97685adc9, stopping replication after 543
    2019/09/07 18:53:13 [INFO]  raft: aborting pipeline replication to peer {Nonvoter 9ba93dec-371e-a43a-fdf0-d3c97685adc9 172.18.0.5:8300}
    2019/09/07 18:53:13 [INFO]  raft: Updating configuration with AddNonvoter (9ba93dec-371e-a43a-fdf0-d3c97685adc9, 172.18.0.5:8300) to [{Suffrage:Voter ID:d7604537-5ad7-5cc9-8deb-34851162e7b5 Address:172.18.0.2:8300} {Suffrage:Voter ID:596724a9-7f24-cdf8-3f66-b0c72d6a006d Address:172.18.0.3:8300} {Suffrage:Voter ID:2207987c-22aa-054e-5ec5-6f934e76ed64 Address:172.18.0.4:8300} {Suffrage:Voter ID:de91d962-ba0b-8b12-4b3e-5469f683b0ae Address:172.18.0.6:8300} {Suffrage:Nonvoter ID:9ba93dec-371e-a43a-fdf0-d3c97685adc9 Address:172.18.0.5:8300}]
    2019/09/07 18:53:13 [INFO]  raft: Added peer 9ba93dec-371e-a43a-fdf0-d3c97685adc9, starting replication
    2019/09/07 18:53:13 [WARN]  raft: AppendEntries to {Nonvoter 9ba93dec-371e-a43a-fdf0-d3c97685adc9 172.18.0.5:8300} rejected, sending older logs (next: 544)
    2019/09/07 18:53:13 [INFO]  raft: pipelining replication to peer {Nonvoter 9ba93dec-371e-a43a-fdf0-d3c97685adc9 172.18.0.5:8300}
    2019/09/07 18:53:13 [INFO] consul: removing server by address: "172.18.0.5:8300"
    2019/09/07 18:53:13 [ERR] consul: failed to remove raft peer '172.18.0.5:8300': operation not supported with current protocol version
    2019/09/07 18:53:13 [ERR] consul: failed to reconcile member: {4cb4ea2e3fa9 172.18.0.5 8301 map[acls:0 build:1.6.0:944cc710 dc:dc1 expect:5 id:1c7e8c48-178d-902d-6ea4-78ac4d869904 port:8300 raft_vsn:3 role:consul segment: vsn:2 vsn_max:3 vsn_min:2 wan_join_port:8302] left 1 5 2 2 5 4}: operation not supported with current protocol version

If the agent with the same ID joins, should we reject ? Or should we delete the old one in consul members if it is already in left state ?

I tired one more thing:

Tried to join agent with different node-id using flag node-id. It joined successfully and became voter/follower. But still NodeName was changing during consul operator raft list-peers and in leader we are getting this error

2019/09/07 19:17:35 [INFO] consul: removing server by address: "172.18.0.5:8300"
2019/09/07 19:17:35 [ERR] consul: failed to remove raft peer '172.18.0.5:8300': operation not supported with current protocol version
2019/09/07 19:17:35 [ERR] consul: failed to reconcile member: {4cb4ea2e3fa9 172.18.0.5 8301 map[acls:0 build:1.6.0:944cc710 dc:dc1 expect:5 id:1c7e8c48-178d-902d-6ea4-78ac4d869904 port:8300 raft_vsn:3 role:consul segment: vsn:2 vsn_max:3 vsn_min:2 wan_join_port:8302] left 1 5 2 2 5 4}: operation not supported with current protocol version

mwild1 commented 8 months ago

I'm having the same problem currently - a server temporarily changed hostname. I'm seeing the same behaviour for both consul and nomad, so I assume the problem is indeed at the serf layer.

I'm not sure how to recover from this situation, as the affected server is failing to rejoin the cluster (it keeps getting removed).

mwild1 commented 8 months ago

I reproduced the issue easily in a test cluster.

After attempting a bunch of things to repair the cluster and allow the node to join, I managed to fix it by stopping consul/nomad, and then manually trimming the start of the nomad/server/serf/snapshot file (and for consul, the serf/*.snapshot files) to remove any mention of the "bad" hostname. I did this one-by-one on each of the servers in the cluster.

This eventually enabled the problematic node to successfully join (for some reason it took a couple of attempts in production).

hashicorp / consul