Consul should handle nodes changing IP addresses

slackpad commented 8 years ago

Thought this was captured but couldn't find an existing issue for this. Here's a discussion - https://groups.google.com/d/msgid/consul-tool/623398ba-1dee-4851-85a2-221ff539c355%40googlegroups.com?utm_medium=email&utm_source=footer. For servers we'd also need to address https://github.com/hashicorp/consul/issues/457.

We are going to close other IP-related issues against this one to keep everything together. The Raft side should support this once you get to Raft protocol version 3, but we need to do testing and will likely have to burn down some small issues to complete this.

csawyerYumaed commented 8 years ago

Awesome! Ideally it would just handle the IP address change, but again, I'd be totally fine with it just falling over and dying for now, and letting whoever started the process handle starting it back up again. Right now it's just broken, it advertises services incorrectly, which is a pretty big ouch.

For people to lazy to follow the link to the google group: My workaround for now is to have dhclient (as an exit hook) restart consul.

r0bnet commented 8 years ago

That would fit my needs. We are running consul-agents via docker (docker-machine) and all machines are retrieving their IPs via DHCP. Docker machine uses the boot2docker image where it is nearly impossible to use those hooks. I start the container with the prefered IP address (-advertise) but when the machine restarts it may have a new ip address. That would result in incorrect DNS responses. Currently I'm looking for a workaround but i can't (yet) see a solution that will work without too much effort. Probably it would be necessary to tell consul which network interface to use. Consul could then determine the correct ip address.

jsullivan3 commented 8 years ago

The dhclient hook is a great workaround for Linux-based (non-Dockerized) environments, but I haven't been able to find an analogous workaround for Windows. Implementing a change within Consul (and Raft) would be incredible.

sweeneyb commented 7 years ago

Does closing #457 in favor of this really move it from the 0.8.2 timeframe to 0.9.x, or are they 2 segments of the same backlog? Is there some sort of roadmap explanation that benefits from a single issue & thus won't have to be duplicated to the above 6 issue?

slackpad commented 7 years ago

@sweeneyb I had actually meant to tag this to 0.8.2 (moved it back there), though given our backlog we may not be able to fully finish this off in time for that release. It seemed better to manage this as a single unit vs. a bunch of similar tickets - this will likely end up with a checklist of things to burn down, which'll be easier to keep track of.

sweeneyb commented 7 years ago

Thanks. You guys iterate fast, so a slip of a few minor versions seems reasonable. I was just hoping it would be in the 0.8.x timeframe.

And again, if there is an approach from any of the discussions that's being favored, that would be great to know. There have been a few fixes proposed, but I don't have as much context to figure out where raft & consul are aiming. -- Thanks for the response.

slackpad commented 7 years ago

Yeah now that we've got Raft using UUIDs for quorum management (if you are using Raft protocol 3) then I think the remaining work is up at the higher level to make sure the Serf-driven parts can properly handle the IP changes for a node of the same name. There might be some work to get the catalog to properly update as well (that also has the UUID knowledge, but still indexes by node name for everything). Honestly, it might take a few more iterations to iron out all the details, but we are moving in the right direction.

hehailong5 commented 7 years ago

Hi, we have been running a script based on the solution stated in the doc for disaster recovery, creating the peers.json with the changed IPs before starting the agent. I am wondering if this still works after UUID been introduced.

slackpad commented 7 years ago

Thanks @hehailong5 I think we missed that one so I opened #3003 so we can get that fixed right away.

kamaradclimber commented 7 years ago

Also impacted by this issue.

Restarting consul agent does not solve the situation but lead to agent seen as failed and consul servers see:

un 08 17:07:57 consul01-par consul[20698]: 2017/06/08 17:07:57 [ERR] memberlist: Conflicting address for e4-1d-2d-1d-07-90.pa4.hpc.criteo.prod. Mine: 10.224.11.18:8301 Theirs: 10.224.11.73:8301
Jun 08 17:07:57 consul01-par consul[20698]: 2017/06/08 17:07:57 [WARN] serf: Name conflict for 'e4-1d-2d-1d-07-90.pa4.hpc.criteo.prod' both 10.224.11.18:8301 and 10.224.11.73:8301 are claiming

Also tried to consul leave on the agent without effect (member is seen as left but we still have the same error messages).

(using consul 0.7.3 though)

Alexey-Tsarev commented 7 years ago

I made some experiment: did an IP address change. Let's demonstrate this. Working cluster with 3 servers:

/root/temp/consul/consul members
Node           Address              Status  Type    Build  Protocol  DC
li.home.local  x.29.111.207:8301   alive   server  0.8.4  3         dc1
rhost.local    y.201.41.69:8301    alive   server  0.8.4  3         dc1
thost.net      z.234.37.183:8301   alive   server  0.8.4  3         dc1

/root/temp/consul/consul operator raft list-peers
Node           ID                                    Address              State     Voter  RaftProtocol
thost.net      dd6fbbf2-bead-fe16-a37b-2208e8bd8234  z.234.37.183:8300   leader    true   3
li.home.local  510841b8-7d15-dc78-a2b0-5dfaf72c23b0  x.29.111.207:8300   follower  true   3
rhost.local    0296831c-aa74-b47e-ded5-75450aff8943  y.201.41.69:8300    follower  true   3

The li.home.local uses this command for Consul running:

/root/temp/consul/consul agent -data-dir=/root/temp/consul/data -server -raft-protocol=3 -protocol=3 -advertise=x.29.111.207

IP address:

ip a | grep inet | grep ppp0
    inet x.29.111.207 peer a.b.c.d/32 scope global ppp0

I am changing IP address via:

ifdown ppp0; ifup ppp0

ip a | grep inet | grep ppp0
    inet o.29.106.101 peer a.b.c.d/32 scope global ppp0

Cluster members list reports old info:

/root/temp/consul/consul members
Node           Address              Status  Type    Build  Protocol  DC
li.home.local  x.29.111.207:8301   failed  server  0.8.4  3         dc1
rhost.local    y.201.41.69:8301    alive   server  0.8.4  3         dc1
thost.net      z.234.37.183:8301   alive   server  0.8.4  3         dc1

If I restart the li.home.local node with new IP address advertised:

/root/temp/consul/consul agent -data-dir=/root/temp/consul/data -server -raft-protocol=3 -protocol=3 -advertise=o.29.106.101

I see a good state of cluster:

/root/temp/consul/consul members
Node           Address              Status  Type    Build  Protocol  DC
li.home.local  o.29.106.101:8301   alive   server  0.8.4  3         dc1
rhost.local    y.201.41.69:8301    alive   server  0.8.4  3         dc1
thost.net      z.234.37.183:8301   alive   server  0.8.4  3         dc1

mitom commented 7 years ago

As @slackpad pointed out in his comment this will work only as long as the majority of the servers stay alive to maintain quorum.

Would it be possible to refer to consul nodes by DNS as well as an IP? This was raised and refused in #1185, but couldn't it be a relatively painless solution? If all the nodes restarted and came back with a different IP, but the same DNS name as they previously advertised the nodes coming back could still connect to each other without having to update the configuration/catalog (wherever consul stores this information).

Or is there some alternative way where even the majority/the entire cluster could go offline and come back with changed IPs and still be able to recover without manually having to perform outage recovery?

Alexey-Tsarev commented 7 years ago

... is there some alternative... without manually having to perform outage recovery?

Bash script that checks IP address changes and restarts Consul...

mitom commented 7 years ago

@AlexeySofree quoting @slackpad from his comment mentioned above (relevant to my scenario):

One interesting bit though is that we still need a quorum to make changes to the configuration, so this would work if a minority of servers was restarted with new IPs, but it still won't work if you have a majority of servers restart with new IPs.

Which is consistent with what I have seen. Your example above covers the working use case (the minority of servers changing IP/restarting). It is also slightly different than what I'm after here: In your example the server changed IP while consul was running and therefore restarting consul on it had an effect. In my scenario the server would crash and return to service with a different IP but the same data which is fairly common in the context of containers. In this case restarting consul would have no effect.

marcelki commented 7 years ago

Restarting consul client through the dhclient hook doesn't solve the problem in my case, even sleeping 10 seconds between stopping and starting. There is always a name and/or address conflict. Are there necessary configuration settings on the server and/or client side?

Anyone can share a reference implementation of the hook?

dstufft commented 7 years ago

Getting this sorted out would make running consul inside of a kubernetes cluster much easier. I currently have that setup and it's working great (with autopilot to handle IP changes on a minority of peers). However it currently doesn't handle the case if restarts/crashes/whatever cause the cluster to lose quorum because when all the instances come back up they're going to have different IP addresses (even if they're still going to have the same data directory).

Is there anything I can do to help move this issue along? I don't really know Go but I can try ;)

sweeneyb commented 7 years ago

I think there was some talk of moving away from keying on IP towards a generated node-id, but I haven't had time to dive into latest versions to see how that has progressed. But that seems like it would allow for IP address changes, while the 0.9.1 move to https://github.com/hashicorp/go-discover seems the obvious way to discover other servers after a quorum-losing outage. Somebody would need to write one for kubernetes, but it looks like a pod label might facilitate that? I'm not familiar with kubernetes, so that's a bit of speculation.

I would also be very interested in knowing the remaining gaps to identifying nodes based on node-id rather than IP address.

dstufft commented 7 years ago

The current behavior of -retry-join is actually super useful for Kubernetes. It's undocumented (that I can find) but if you pass a DNS name to -retry-join it will lookup the DNS entry and act as if you ran -retry-join on all of the A records that it returned. If you combine this with a Kubernetes headless service, which creates an internal DNS name with an A record per pod IP address, then it just works to discover all of the pods.

So in my case, the serf gossip layer is back up and running, and consul members sees all the nodes automatically. The only part that currently doesn't work is that because the IP address changed and we lost quorum, the cluster gets wedged and is unable to elect a leader.

preetapan commented 7 years ago

We have been talking about this internally, and like @dstufft pointed out, with autopilot we already handle having a minority of instances change their IPs, as long as there is quorum. To support all the server nodes going away and coming back with new IPs, we would have to do a pretty significant rewrite of the raft library and how consul uses it, and recovery mechanisms like peers.json have to change as well. We do want to address this at some point, but don't have a concrete timeline for it yet given the scope of the change.

dstufft commented 7 years ago

@preetapan Ah interesting, the original message by @slackpad made it sound like the raft library was mostly there, and it was just plumbing through a few changes rather than a big rewrite. It sounds like this is something that is unlikely to be solved in the short term, and I'd likely be better off scripting something that can detect a wedged cluster and automating recovery via peers.json rather than just waiting for this issue to get solved.

slackpad commented 7 years ago

@dstufft We didn't anticipate the "have all the servers change IPs at once" case when we were scoping this, so it will take some more work to sort out, though the work we've done so far helps move things in the right direction. The Raft library changes to date have plumbed a UUID for each server, along with its IP, so we have the ability to tell that an IP has changed and fix it up automatically, which works great as long as you have quorum. This has already helped make Consul easier to manage for a lot of use cases. The next phase would need to get rid of IPs altogether at the Raft layer and delegate connecting to a given server (via UUID) up to Consul, which would be able to find the IP based on the information from consul members, essentially. We are getting there, but keeping everything one step backwards compatible and moving everything forward safely will take some effort :-)

dstufft commented 7 years ago

Makes sense. I'll just go with the automatic recovery process for a wedged cluster for now! Now just to figure out how to detect if the cluster is wedged D:

preetapan commented 7 years ago

@dstufft you can run the operator command consul operator raft list-peers , but put it behind a timeout of 60 seconds or something like that. If that takes longer than that, its indicative of the cluster being in a wedged state. Might need to have a more conservative number for the timeout.

kaskavalci commented 7 years ago

Do you have a recommended way of dealing with this when all IPs has changed? It would be good to put this documentation in consul helm chart.

slackpad commented 7 years ago

@kaskavalci if you've lost quorum you'll need to run https://www.consul.io/docs/guides/outage.html#manual-recovery-using-peers-json in order to get the cluster back into an operational state. This essentially syncs up the server IPs manually.

kaskavalci commented 7 years ago

@slackpad Does that have an impact on the data Consul holds? Basically we should turn off cluster, go their PVs and create peers.json file, am I right?

slackpad commented 7 years ago

@kaskavalci that's right - your data is in the Raft log + snapshots so it should be intact after a manual recovery with peers.json.

kaskavalci commented 7 years ago

In K8S environment, how can we write static pod IPs when we have to restart consul in order to digest peers.json file? We couldn't be able to recover when we delete 2 of the 3 pods. Can you give steps of the operation?

Bonus question: 1 server installation cannot recover from restart. Any suggestions how to recover without HA?

dstufft commented 7 years ago

Is this fixed in master now? It was closed which indicates yes, but https://github.com/hashicorp/raft/issues/237 is still open so I'm not sure!

preetapan commented 7 years ago

@dstufft Yes, this is in the master branch now. I've tested it using a hand rolled docker orchestration bash script that forces all server nodes to come back with new addresses. I am looking for volunteers from the community to verify this within their environment. If you would be willing to help verify this let us know. WE expect to have the release candidate for 0.9.3 on September 5.

dstufft commented 7 years ago

I can help verify for sure-- Is there an easy way to get a binary for this now? I guess I could just figure out how to build consul myself :)

preetapan commented 7 years ago

yes should be straightforward to build this. Make sure you are on go 1.9, git clone consul, run NOTAG=1 make dist. It will build statically linked binaries for all platforms, under pkg/ that you can then deploy into your container environment. Also, please set "raft_protocol":3 in your config files when starting Consul.

geez commented 7 years ago

@preetapan Hello, is the release candidate available? You mentioned Sept 5th, really interested to test this.

fc1943s commented 7 years ago

@preetapan @geez yeah without this im having a really hard time to set up the cluster on Swarm. Can we have at least a beta image built? Thanks

jcassee commented 7 years ago

I recently Dockerized master. Look for goabout/consul on Docker Hub.

slackpad commented 7 years ago

Hi @dstufft @geez @stewshka @jcassee we just cut a release candidate build for 0.9.3 to test this out (this isn't a production-ready build, but it's built like one and signed). Please let us know if you can give this a go - https://releases.hashicorp.com/consul/0.9.3-rc1/.

preetapan commented 7 years ago

As a reminder to anyone trying out the RC, the fix works only if you set raft_protocol to 3 in your config files

erkolson commented 7 years ago

Hey all, I'm trying to test this out but the initial cluster is not electing a leader.

Here is my code: https://github.com/erkolson/consul-v0.9.3-rc1-test

Here is the log from the consul-test-0 pod:

==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.9.3-rc1-rc1 (d62743c)'
           Node ID: 'b84a7750-dcdb-c63a-1ae8-2ef036731c81'
         Node name: 'consul-test-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.37.84.6 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2017/09/06 13:30:24 [INFO] raft: Initial configuration (index=0): []
    2017/09/06 13:30:24 [INFO] raft: Node at 10.37.84.6:8300 [Follower] entering Follower state (Leader: "")
    2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-0.dc1 10.37.84.6
    2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-0 10.37.84.6
    2017/09/06 13:30:24 [INFO] consul: Handled member-join event for server "consul-test-0.dc1" in area "wan"    2017/09/06 13:30:24 [INFO] agent: Retry join LAN is supported for: aws azure gce softlayer
    2017/09/06 13:30:24 [INFO] agent: Joining LAN cluster...
    2017/09/06 13:30:24 [INFO] agent: (LAN) joining: [10.37.84.6 10.36.180.8 10.33.92.6]

    2017/09/06 13:30:24 [INFO] consul: Adding LAN server consul-test-0 (Addr: tcp/10.37.84.6:8300) (DC: dc1)
    2017/09/06 13:30:24 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2017/09/06 13:30:24 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2017/09/06 13:30:24 [INFO] agent: Started HTTP server on [::]:8500
    2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-2 10.33.92.6
    2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-1 10.36.180.8
    2017/09/06 13:30:24 [INFO] consul: Adding LAN server consul-test-2 (Addr: tcp/10.33.92.6:8300) (DC: dc1)
    2017/09/06 13:30:24 [INFO] consul: Adding LAN server consul-test-1 (Addr: tcp/10.36.180.8:8300) (DC: dc1)
    2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-2.dc1 10.33.92.6
    2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-1.dc1 10.36.180.8
    2017/09/06 13:30:24 [INFO] consul: Handled member-join event for server "consul-test-2.dc1" in area "wan"
    2017/09/06 13:30:24 [INFO] consul: Handled member-join event for server "consul-test-1.dc1" in area "wan"
    2017/09/06 13:30:24 [INFO] agent: (LAN) joined: 3 Err: <nil>
    2017/09/06 13:30:24 [INFO] agent: Join LAN completed. Synced with 3 initial agents
    2017/09/06 13:30:30 [WARN] raft: no known peers, aborting election
    2017/09/06 13:30:31 [ERR] agent: failed to sync remote state: No cluster leader

Also, @preetapan, quoting the 3 for raft_protocol in server.json causes an error:

[consul-test-0] * 'raft_protocol' expected type 'int', got unconvertible type 'string'

preetapan commented 7 years ago

@erkolson that was a typo, edited it to fix now.

can you try adding bootstrap-expect=3 when you start consul? Here's my orchestration script that uses docker where I tested terminating all servers and starting them back up with new ips

slackpad commented 7 years ago

@erkolson I think you also need to set bootstrap_expect in https://github.com/erkolson/consul-v0.9.3-rc1-test/blob/master/manifests/consul-test-config.yaml to the number of servers you are running to get the cluster to initially bootstrap.

erkolson commented 7 years ago

Thanks, I added boostrap-expect to the exec command and the cluster initializes. It took a while to figure out how to recreate the pods with new IP addresses...

This is the initial cluster:

Node           ID                                    Address           State     Voter  RaftProtocol
consul-test-1  7931eb2f-3e44-831e-acff-d8345ad345ae  10.36.180.8:8300  leader    true   3
consul-test-0  b84a7750-dcdb-c63a-1ae8-2ef036731c81  10.37.84.6:8300   follower  true   3
consul-test-2  29f263d1-e7b5-e905-13b1-931f7968cb3e  10.33.92.6:8300   follower  true   3

After getting the pods to start with new IPs, I see this:

Node           ID                                    Address           State     Voter  RaftProtocol
(unknown)      b84a7750-dcdb-c63a-1ae8-2ef036731c81  10.37.84.6:8300   follower  true   <=1
consul-test-2  29f263d1-e7b5-e905-13b1-931f7968cb3e  10.33.92.13:8300  follower  true   3
consul-test-1  7931eb2f-3e44-831e-acff-d8345ad345ae  10.37.92.8:8300   follower  true   3

The data is still there, consul kg get -recurse shows the keys I set prior to restarting but the previous IP address of the consul-test-0 pod did not get updated. These are the current pod addresses:

NAME                        READY     STATUS    RESTARTS   AGE       IP
consul-test-0               1/1       Running   0          7m        10.36.204.14 
consul-test-1               1/1       Running   0          7m        10.37.92.8
consul-test-2               1/1       Running   0          7m        10.33.92.13

Logs from consul-test-0

``` [consul-test-0] ==> WARNING: Expect Mode enabled, expecting 3 servers [consul-test-0] ==> Starting Consul agent... [consul-test-0] ==> Consul agent running! [consul-test-0] Version: 'v0.9.3-rc1-rc1 (d62743c)' [consul-test-0] Node ID: 'b84a7750-dcdb-c63a-1ae8-2ef036731c81' [consul-test-0] Node name: 'consul-test-0' [consul-test-0] Datacenter: 'dc1' (Segment: '') [consul-test-0] Server: true (Bootstrap: false) [consul-test-0] Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600) [consul-test-0] Cluster Addr: 10.36.204.14 (LAN: 8301, WAN: 8302) [consul-test-0] Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false [consul-test-0] [consul-test-0] ==> Log data will now stream in as it occurs: [consul-test-0] [consul-test-0] 2017/09/06 14:44:32 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:7931eb2f-3e44-831e-acff-d8345ad345ae Address:10.36.180.8:8300} {Suffrage:Voter ID:29f263d1-e7b5-e905-13b1-931f7968cb3e Address:10.33.92.6:8300} {Suffrage:Voter ID:b84a7750-dcdb-c63a-1ae8-2ef036731c81 Address:10.37.84.6:8300}] [consul-test-0] 2017/09/06 14:44:32 [INFO] raft: Node at 10.36.204.14:8300 [Follower] entering Follower state (Leader: "") [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: EventMemberJoin: consul-test-0.dc1 10.36.204.14 [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: Attempting re-join to previously known node: consul-test-2.dc1: 10.33.92.6:8302 [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: Attempting re-join to previously known node: consul-test-1.dc1: 10.36.180.8:8302 [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: EventMemberJoin: consul-test-0 10.36.204.14 [consul-test-0] 2017/09/06 14:44:32 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp) [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: Attempting re-join to previously known node: consul-test-1: 10.36.180.8:8301 [consul-test-0] 2017/09/06 14:44:32 [INFO] consul: Adding LAN server consul-test-0 (Addr: tcp/10.36.204.14:8300) (DC: dc1) [consul-test-0] 2017/09/06 14:44:32 [INFO] consul: Raft data found, disabling bootstrap mode [consul-test-0] 2017/09/06 14:44:32 [INFO] consul: Handled member-join event for server "consul-test-0.dc1" in area "wan" [consul-test-0] 2017/09/06 14:44:32 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp) [consul-test-0] 2017/09/06 14:44:32 [WARN] serf: Failed to re-join any previously known node [consul-test-0] 2017/09/06 14:44:32 [WARN] serf: Failed to re-join any previously known node [consul-test-0] 2017/09/06 14:44:32 [INFO] agent: Started HTTP server on [::]:8500 [consul-test-0] 2017/09/06 14:44:32 [INFO] agent: Retry join LAN is supported for: aws azure gce softlayer [consul-test-0] 2017/09/06 14:44:32 [INFO] agent: Joining LAN cluster... [consul-test-0] 2017/09/06 14:44:32 [INFO] agent: (LAN) joining: [10.36.204.14 10.37.92.8 10.33.92.13] [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: EventMemberJoin: consul-test-2 10.33.92.13 [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: EventMemberJoin: consul-test-1 10.37.92.8 [consul-test-0] 2017/09/06 14:44:32 [INFO] consul: Adding LAN server consul-test-2 (Addr: tcp/10.33.92.13:8300) (DC: dc1) [consul-test-0] 2017/09/06 14:44:32 [INFO] consul: Adding LAN server consul-test-1 (Addr: tcp/10.37.92.8:8300) (DC: dc1) [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: EventMemberJoin: consul-test-2.dc1 10.33.92.13 [consul-test-0] 2017/09/06 14:44:32 [INFO] serf: EventMemberJoin: consul-test-1.dc1 10.37.92.8 [consul-test-0] 2017/09/06 14:44:32 [INFO] consul: Handled member-join event for server "consul-test-2.dc1" in area "wan" [consul-test-0] 2017/09/06 14:44:32 [INFO] consul: Handled member-join event for server "consul-test-1.dc1" in area "wan" [consul-test-0] 2017/09/06 14:44:32 [INFO] agent: (LAN) joined: 3 Err: [consul-test-0] 2017/09/06 14:44:32 [INFO] agent: Join LAN completed. Synced with 3 initial agents [consul-test-0] 2017/09/06 14:44:38 [WARN] raft: Heartbeat timeout from "" reached, starting election [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Node at 10.36.204.14:8300 [Candidate] entering Candidate state in term 8 [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Election won. Tally: 2 [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Node at 10.36.204.14:8300 [Leader] entering Leader state [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Added peer 7931eb2f-3e44-831e-acff-d8345ad345ae, starting replication [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Added peer 29f263d1-e7b5-e905-13b1-931f7968cb3e, starting replication [consul-test-0] 2017/09/06 14:44:38 [INFO] consul: cluster leadership acquired [consul-test-0] 2017/09/06 14:44:38 [INFO] consul: New leader elected: consul-test-0 [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: pipelining replication to peer {Voter 7931eb2f-3e44-831e-acff-d8345ad345ae 10.36.180.8:8300} [consul-test-0] 2017/09/06 14:44:38 [WARN] raft: AppendEntries to {Voter 29f263d1-e7b5-e905-13b1-931f7968cb3e 10.33.92.6:8300} rejected, sending older logs (next: 178) [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: pipelining replication to peer {Voter 29f263d1-e7b5-e905-13b1-931f7968cb3e 10.33.92.6:8300} [consul-test-0] 2017/09/06 14:44:38 [INFO] consul: member 'consul-test-0' joined, marking health alive [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Updating configuration with RemoveServer (29f263d1-e7b5-e905-13b1-931f7968cb3e, ) to [{Suffrage:Voter ID:7931eb2f-3e44-831e-acff-d8345ad345ae Address:10.36.180.8:8300} {Suffrage:Voter ID:b84a7750-dcdb-c63a-1ae8-2ef036731c81 Address:10.37.84.6:8300}] [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Removed peer 29f263d1-e7b5-e905-13b1-931f7968cb3e, stopping replication after 184 [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: aborting pipeline replication to peer {Voter 29f263d1-e7b5-e905-13b1-931f7968cb3e 10.33.92.6:8300} [consul-test-0] 2017/09/06 14:44:38 [INFO] consul: removed server with duplicate ID: 29f263d1-e7b5-e905-13b1-931f7968cb3e [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Updating configuration with AddNonvoter (29f263d1-e7b5-e905-13b1-931f7968cb3e, 10.33.92.13:8300) to [{Suffrage:Voter ID:7931eb2f-3e44-831e-acff-d8345ad345ae Address:10.36.180.8:8300} {Suffrage:Voter ID:b84a7750-dcdb-c63a-1ae8-2ef036731c81 Address:10.37.84.6:8300} {Suffrage:Nonvoter ID:29f263d1-e7b5-e905-13b1-931f7968cb3e Address:10.33.92.13:8300}] [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Added peer 29f263d1-e7b5-e905-13b1-931f7968cb3e, starting replication [consul-test-0] 2017/09/06 14:44:38 [WARN] raft: AppendEntries to {Nonvoter 29f263d1-e7b5-e905-13b1-931f7968cb3e 10.33.92.13:8300} rejected, sending older logs (next: 185) [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Updating configuration with RemoveServer (7931eb2f-3e44-831e-acff-d8345ad345ae, ) to [{Suffrage:Voter ID:b84a7750-dcdb-c63a-1ae8-2ef036731c81 Address:10.37.84.6:8300} {Suffrage:Nonvoter ID:29f263d1-e7b5-e905-13b1-931f7968cb3e Address:10.33.92.13:8300}] [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: pipelining replication to peer {Nonvoter 29f263d1-e7b5-e905-13b1-931f7968cb3e 10.33.92.13:8300} [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Removed peer 7931eb2f-3e44-831e-acff-d8345ad345ae, stopping replication after 186 [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: aborting pipeline replication to peer {Voter 7931eb2f-3e44-831e-acff-d8345ad345ae 10.36.180.8:8300} [consul-test-0] 2017/09/06 14:44:38 [INFO] consul: removed server with duplicate ID: 7931eb2f-3e44-831e-acff-d8345ad345ae [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Updating configuration with AddNonvoter (7931eb2f-3e44-831e-acff-d8345ad345ae, 10.37.92.8:8300) to [{Suffrage:Voter ID:b84a7750-dcdb-c63a-1ae8-2ef036731c81 Address:10.37.84.6:8300} {Suffrage:Nonvoter ID:29f263d1-e7b5-e905-13b1-931f7968cb3e Address:10.33.92.13:8300} {Suffrage:Nonvoter ID:7931eb2f-3e44-831e-acff-d8345ad345ae Address:10.37.92.8:8300}] [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: Added peer 7931eb2f-3e44-831e-acff-d8345ad345ae, starting replication [consul-test-0] 2017/09/06 14:44:38 [INFO] consul: member 'consul-test-1' joined, marking health alive [consul-test-0] 2017/09/06 14:44:38 [WARN] raft: AppendEntries to {Nonvoter 7931eb2f-3e44-831e-acff-d8345ad345ae 10.37.92.8:8300} rejected, sending older logs (next: 187) [consul-test-0] 2017/09/06 14:44:38 [INFO] raft: pipelining replication to peer {Nonvoter 7931eb2f-3e44-831e-acff-d8345ad345ae 10.37.92.8:8300} [consul-test-0] 2017/09/06 14:44:38 [INFO] agent: Synced node info [consul-test-0] 2017/09/06 14:44:58 [INFO] raft: Updating configuration with AddStaging (29f263d1-e7b5-e905-13b1-931f7968cb3e, 10.33.92.13:8300) to [{Suffrage:Voter ID:b84a7750-dcdb-c63a-1ae8-2ef036731c81 Address:10.37.84.6:8300} {Suffrage:Voter ID:29f263d1-e7b5-e905-13b1-931f7968cb3e Address:10.33.92.13:8300} {Suffrage:Nonvoter ID:7931eb2f-3e44-831e-acff-d8345ad345ae Address:10.37.92.8:8300}] [consul-test-0] 2017/09/06 14:44:58 [INFO] raft: Updating configuration with AddStaging (7931eb2f-3e44-831e-acff-d8345ad345ae, 10.37.92.8:8300) to [{Suffrage:Voter ID:b84a7750-dcdb-c63a-1ae8-2ef036731c81 Address:10.37.84.6:8300} {Suffrage:Voter ID:29f263d1-e7b5-e905-13b1-931f7968cb3e Address:10.33.92.13:8300} {Suffrage:Voter ID:7931eb2f-3e44-831e-acff-d8345ad345ae Address:10.37.92.8:8300}] ```

preetapan commented 7 years ago

@erkolson Is the cluster operational otherwise, and you are able to use it for service registeration/KV writes etc?

The wrong IP address issue you mentioned above might be a temporary sync issue that affects the output of consul operator raft list-peers till the leader does a reconcile step where it fixes what it displays above. I will have to test it out some more to confirm though.

erkolson commented 7 years ago

Indeed, consul kv get and put are working.

I'll leave it running for a bit longer to see if the peers list reconciles. So far, ~30 minutes, no change. consul members does show the correct IP though.

jcassee commented 7 years ago

Although at the moment I have no logs to show for it, I had the exact same problem when running the master branch.

preetapan commented 7 years ago

@erkolson Do you mind trying the same test with 5 instead of 3 servers? I have a fix in the works for making this work, the root cause is that autopilot will not do the config fix for the server with the wrong IP because that's going to cause it to lose quorum.

Please let me know if you still see the problem with 5 servers..

erkolson commented 7 years ago

@preetapan, I ran the test again with 5 servers and this time consul-node-0 was updated

Initial cluster:

Node           ID                                    Address            State     Voter  RaftProtocol
consul-test-1  964b92b9-0ac2-56af-9db9-d30771155c66  10.38.124.6:8300   leader    true   3
consul-test-4  22dd0f0a-a2e7-48d4-d4bb-33726cae71de  10.37.92.7:8300    follower  true   3
consul-test-0  3c1fc748-b5d5-684d-2b73-cc08ce72be6d  10.37.84.4:8300    follower  true   3
consul-test-3  0b9704ae-460a-762f-6c83-19c644899cf6  10.33.58.8:8300    follower  true   3
consul-test-2  1a1046e6-c627-e40d-8108-320fcd818a3e  10.45.124.12:8300  follower  true   3

Intermediate step after pods recreated:

Node           ID                                    Address            State     Voter  RaftProtocol
(unknown)      1a1046e6-c627-e40d-8108-320fcd818a3e  10.45.124.12:8300  follower  true   <=1
consul-test-4  22dd0f0a-a2e7-48d4-d4bb-33726cae71de  10.37.92.8:8300    follower  false  3
consul-test-3  0b9704ae-460a-762f-6c83-19c644899cf6  10.36.204.13:8300  follower  false  3
consul-test-0  3c1fc748-b5d5-684d-2b73-cc08ce72be6d  10.37.84.5:8300    follower  false  3
consul-test-1  964b92b9-0ac2-56af-9db9-d30771155c66  10.38.116.10:8300  follower  false  3

And finally, ~40s after startup:

Node           ID                                    Address            State     Voter  RaftProtocol
consul-test-4  22dd0f0a-a2e7-48d4-d4bb-33726cae71de  10.37.92.8:8300    follower  true   3
consul-test-3  0b9704ae-460a-762f-6c83-19c644899cf6  10.36.204.13:8300  follower  true   3
consul-test-0  3c1fc748-b5d5-684d-2b73-cc08ce72be6d  10.37.84.5:8300    leader    true   3
consul-test-1  964b92b9-0ac2-56af-9db9-d30771155c66  10.38.116.10:8300  follower  true   3
consul-test-2  1a1046e6-c627-e40d-8108-320fcd818a3e  10.36.168.4:8300   follower  false  3

Looks good!

preetapan commented 7 years ago

@erkolson Thanks for your help in testing this, we really appreciate it!

slackpad commented 7 years ago

We definitely appreciate all the help testing this. We cut a build with the fix @preetapan added via #3450 in https://releases.hashicorp.com/consul/0.9.3-rc2/. If you can give that a look please let us know if you see any remaining issues.

erkolson commented 7 years ago

I tested again with 3 nodes and rc2. This time it took ~2 min after startup with new IPs for the peers list to reconcile but all seems to be working.

You are welcome for the help, I'm happy to see this functionality. I have experienced first hand all consul pods being rescheduled simultaneously a couple months ago :-)

faheem-nadeem commented 7 years ago

On 0.9.3. Still seems to have cluster leader problem.

hashicorp / consul

Consul should handle nodes changing IP addresses #1580