Closed slackpad closed 7 years ago
Awesome! Ideally it would just handle the IP address change, but again, I'd be totally fine with it just falling over and dying for now, and letting whoever started the process handle starting it back up again. Right now it's just broken, it advertises services incorrectly, which is a pretty big ouch.
For people to lazy to follow the link to the google group: My workaround for now is to have dhclient (as an exit hook) restart consul.
That would fit my needs. We are running consul-agents via docker (docker-machine) and all machines are retrieving their IPs via DHCP. Docker machine uses the boot2docker image where it is nearly impossible to use those hooks. I start the container with the prefered IP address (-advertise) but when the machine restarts it may have a new ip address. That would result in incorrect DNS responses. Currently I'm looking for a workaround but i can't (yet) see a solution that will work without too much effort. Probably it would be necessary to tell consul which network interface to use. Consul could then determine the correct ip address.
The dhclient hook is a great workaround for Linux-based (non-Dockerized) environments, but I haven't been able to find an analogous workaround for Windows. Implementing a change within Consul (and Raft) would be incredible.
Does closing #457 in favor of this really move it from the 0.8.2 timeframe to 0.9.x, or are they 2 segments of the same backlog? Is there some sort of roadmap explanation that benefits from a single issue & thus won't have to be duplicated to the above 6 issue?
@sweeneyb I had actually meant to tag this to 0.8.2 (moved it back there), though given our backlog we may not be able to fully finish this off in time for that release. It seemed better to manage this as a single unit vs. a bunch of similar tickets - this will likely end up with a checklist of things to burn down, which'll be easier to keep track of.
Thanks. You guys iterate fast, so a slip of a few minor versions seems reasonable. I was just hoping it would be in the 0.8.x timeframe.
And again, if there is an approach from any of the discussions that's being favored, that would be great to know. There have been a few fixes proposed, but I don't have as much context to figure out where raft & consul are aiming. -- Thanks for the response.
Yeah now that we've got Raft using UUIDs for quorum management (if you are using Raft protocol 3) then I think the remaining work is up at the higher level to make sure the Serf-driven parts can properly handle the IP changes for a node of the same name. There might be some work to get the catalog to properly update as well (that also has the UUID knowledge, but still indexes by node name for everything). Honestly, it might take a few more iterations to iron out all the details, but we are moving in the right direction.
Hi, we have been running a script based on the solution stated in the doc for disaster recovery, creating the peers.json with the changed IPs before starting the agent. I am wondering if this still works after UUID been introduced.
Thanks @hehailong5 I think we missed that one so I opened #3003 so we can get that fixed right away.
Also impacted by this issue.
Restarting consul agent does not solve the situation but lead to agent seen as failed and consul servers see:
un 08 17:07:57 consul01-par consul[20698]: 2017/06/08 17:07:57 [ERR] memberlist: Conflicting address for e4-1d-2d-1d-07-90.pa4.hpc.criteo.prod. Mine: 10.224.11.18:8301 Theirs: 10.224.11.73:8301
Jun 08 17:07:57 consul01-par consul[20698]: 2017/06/08 17:07:57 [WARN] serf: Name conflict for 'e4-1d-2d-1d-07-90.pa4.hpc.criteo.prod' both 10.224.11.18:8301 and 10.224.11.73:8301 are claiming
Also tried to consul leave
on the agent without effect (member is seen as left but we still have the same error messages).
(using consul 0.7.3 though)
I made some experiment: did an IP address change. Let's demonstrate this. Working cluster with 3 servers:
/root/temp/consul/consul members
Node Address Status Type Build Protocol DC
li.home.local x.29.111.207:8301 alive server 0.8.4 3 dc1
rhost.local y.201.41.69:8301 alive server 0.8.4 3 dc1
thost.net z.234.37.183:8301 alive server 0.8.4 3 dc1
/root/temp/consul/consul operator raft list-peers
Node ID Address State Voter RaftProtocol
thost.net dd6fbbf2-bead-fe16-a37b-2208e8bd8234 z.234.37.183:8300 leader true 3
li.home.local 510841b8-7d15-dc78-a2b0-5dfaf72c23b0 x.29.111.207:8300 follower true 3
rhost.local 0296831c-aa74-b47e-ded5-75450aff8943 y.201.41.69:8300 follower true 3
The li.home.local
uses this command for Consul running:
/root/temp/consul/consul agent -data-dir=/root/temp/consul/data -server -raft-protocol=3 -protocol=3 -advertise=x.29.111.207
IP address:
ip a | grep inet | grep ppp0
inet x.29.111.207 peer a.b.c.d/32 scope global ppp0
I am changing IP address via:
ifdown ppp0; ifup ppp0
ip a | grep inet | grep ppp0
inet o.29.106.101 peer a.b.c.d/32 scope global ppp0
Cluster members list reports old info:
/root/temp/consul/consul members
Node Address Status Type Build Protocol DC
li.home.local x.29.111.207:8301 failed server 0.8.4 3 dc1
rhost.local y.201.41.69:8301 alive server 0.8.4 3 dc1
thost.net z.234.37.183:8301 alive server 0.8.4 3 dc1
If I restart the li.home.local
node with new IP address advertised:
/root/temp/consul/consul agent -data-dir=/root/temp/consul/data -server -raft-protocol=3 -protocol=3 -advertise=o.29.106.101
I see a good state of cluster:
/root/temp/consul/consul members
Node Address Status Type Build Protocol DC
li.home.local o.29.106.101:8301 alive server 0.8.4 3 dc1
rhost.local y.201.41.69:8301 alive server 0.8.4 3 dc1
thost.net z.234.37.183:8301 alive server 0.8.4 3 dc1
As @slackpad pointed out in his comment this will work only as long as the majority of the servers stay alive to maintain quorum.
Would it be possible to refer to consul nodes by DNS as well as an IP? This was raised and refused in #1185, but couldn't it be a relatively painless solution? If all the nodes restarted and came back with a different IP, but the same DNS name as they previously advertised the nodes coming back could still connect to each other without having to update the configuration/catalog (wherever consul stores this information).
Or is there some alternative way where even the majority/the entire cluster could go offline and come back with changed IPs and still be able to recover without manually having to perform outage recovery?
... is there some alternative... without manually having to perform outage recovery?
Bash script that checks IP address changes and restarts Consul...
@AlexeySofree quoting @slackpad from his comment mentioned above (relevant to my scenario):
One interesting bit though is that we still need a quorum to make changes to the configuration, so this would work if a minority of servers was restarted with new IPs, but it still won't work if you have a majority of servers restart with new IPs.
Which is consistent with what I have seen. Your example above covers the working use case (the minority of servers changing IP/restarting). It is also slightly different than what I'm after here: In your example the server changed IP while consul was running and therefore restarting consul on it had an effect. In my scenario the server would crash and return to service with a different IP but the same data which is fairly common in the context of containers. In this case restarting consul would have no effect.
Restarting consul client through the dhclient hook doesn't solve the problem in my case, even sleeping 10 seconds between stopping and starting. There is always a name and/or address conflict. Are there necessary configuration settings on the server and/or client side?
Anyone can share a reference implementation of the hook?
Getting this sorted out would make running consul inside of a kubernetes cluster much easier. I currently have that setup and it's working great (with autopilot to handle IP changes on a minority of peers). However it currently doesn't handle the case if restarts/crashes/whatever cause the cluster to lose quorum because when all the instances come back up they're going to have different IP addresses (even if they're still going to have the same data directory).
Is there anything I can do to help move this issue along? I don't really know Go but I can try ;)
I think there was some talk of moving away from keying on IP towards a generated node-id, but I haven't had time to dive into latest versions to see how that has progressed. But that seems like it would allow for IP address changes, while the 0.9.1 move to https://github.com/hashicorp/go-discover seems the obvious way to discover other servers after a quorum-losing outage. Somebody would need to write one for kubernetes, but it looks like a pod label might facilitate that? I'm not familiar with kubernetes, so that's a bit of speculation.
I would also be very interested in knowing the remaining gaps to identifying nodes based on node-id rather than IP address.
The current behavior of -retry-join
is actually super useful for Kubernetes. It's undocumented (that I can find) but if you pass a DNS name to -retry-join
it will lookup the DNS entry and act as if you ran -retry-join
on all of the A records that it returned. If you combine this with a Kubernetes headless service, which creates an internal DNS name with an A record per pod IP address, then it just works to discover all of the pods.
So in my case, the serf gossip layer is back up and running, and consul members
sees all the nodes automatically. The only part that currently doesn't work is that because the IP address changed and we lost quorum, the cluster gets wedged and is unable to elect a leader.
We have been talking about this internally, and like @dstufft pointed out, with autopilot we already handle having a minority of instances change their IPs, as long as there is quorum. To support all the server nodes going away and coming back with new IPs, we would have to do a pretty significant rewrite of the raft library and how consul uses it, and recovery mechanisms like peers.json have to change as well. We do want to address this at some point, but don't have a concrete timeline for it yet given the scope of the change.
@preetapan Ah interesting, the original message by @slackpad made it sound like the raft library was mostly there, and it was just plumbing through a few changes rather than a big rewrite. It sounds like this is something that is unlikely to be solved in the short term, and I'd likely be better off scripting something that can detect a wedged cluster and automating recovery via peers.json
rather than just waiting for this issue to get solved.
@dstufft We didn't anticipate the "have all the servers change IPs at once" case when we were scoping this, so it will take some more work to sort out, though the work we've done so far helps move things in the right direction. The Raft library changes to date have plumbed a UUID for each server, along with its IP, so we have the ability to tell that an IP has changed and fix it up automatically, which works great as long as you have quorum. This has already helped make Consul easier to manage for a lot of use cases. The next phase would need to get rid of IPs altogether at the Raft layer and delegate connecting to a given server (via UUID) up to Consul, which would be able to find the IP based on the information from consul members
, essentially. We are getting there, but keeping everything one step backwards compatible and moving everything forward safely will take some effort :-)
Makes sense. I'll just go with the automatic recovery process for a wedged cluster for now! Now just to figure out how to detect if the cluster is wedged D:
@dstufft you can run the operator command consul operator raft list-peers
, but put it behind a timeout of 60 seconds or something like that. If that takes longer than that, its indicative of the cluster being in a wedged state. Might need to have a more conservative number for the timeout.
Do you have a recommended way of dealing with this when all IPs has changed? It would be good to put this documentation in consul helm chart.
@kaskavalci if you've lost quorum you'll need to run https://www.consul.io/docs/guides/outage.html#manual-recovery-using-peers-json in order to get the cluster back into an operational state. This essentially syncs up the server IPs manually.
@slackpad Does that have an impact on the data Consul holds? Basically we should turn off cluster, go their PVs and create peers.json
file, am I right?
@kaskavalci that's right - your data is in the Raft log + snapshots so it should be intact after a manual recovery with peers.json.
In K8S environment, how can we write static pod IPs when we have to restart consul in order to digest peers.json file? We couldn't be able to recover when we delete 2 of the 3 pods. Can you give steps of the operation?
Bonus question: 1 server installation cannot recover from restart. Any suggestions how to recover without HA?
Is this fixed in master
now? It was closed which indicates yes, but https://github.com/hashicorp/raft/issues/237 is still open so I'm not sure!
@dstufft Yes, this is in the master branch now. I've tested it using a hand rolled docker orchestration bash script that forces all server nodes to come back with new addresses. I am looking for volunteers from the community to verify this within their environment. If you would be willing to help verify this let us know. WE expect to have the release candidate for 0.9.3 on September 5.
I can help verify for sure-- Is there an easy way to get a binary for this now? I guess I could just figure out how to build consul myself :)
yes should be straightforward to build this. Make sure you are on go 1.9, git clone consul, run NOTAG=1 make dist
. It will build statically linked binaries for all platforms, under pkg/"raft_protocol":3
in your config files when starting Consul.
@preetapan Hello, is the release candidate available? You mentioned Sept 5th, really interested to test this.
@preetapan @geez yeah without this im having a really hard time to set up the cluster on Swarm. Can we have at least a beta image built? Thanks
I recently Dockerized master. Look for goabout/consul on Docker Hub.
Hi @dstufft @geez @stewshka @jcassee we just cut a release candidate build for 0.9.3 to test this out (this isn't a production-ready build, but it's built like one and signed). Please let us know if you can give this a go - https://releases.hashicorp.com/consul/0.9.3-rc1/.
As a reminder to anyone trying out the RC, the fix works only if you set raft_protocol to 3 in your config files
Hey all, I'm trying to test this out but the initial cluster is not electing a leader.
Here is my code: https://github.com/erkolson/consul-v0.9.3-rc1-test
Here is the log from the consul-test-0
pod:
==> Starting Consul agent...
==> Consul agent running!
Version: 'v0.9.3-rc1-rc1 (d62743c)'
Node ID: 'b84a7750-dcdb-c63a-1ae8-2ef036731c81'
Node name: 'consul-test-0'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 10.37.84.6 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
==> Log data will now stream in as it occurs:
2017/09/06 13:30:24 [INFO] raft: Initial configuration (index=0): []
2017/09/06 13:30:24 [INFO] raft: Node at 10.37.84.6:8300 [Follower] entering Follower state (Leader: "")
2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-0.dc1 10.37.84.6
2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-0 10.37.84.6
2017/09/06 13:30:24 [INFO] consul: Handled member-join event for server "consul-test-0.dc1" in area "wan" 2017/09/06 13:30:24 [INFO] agent: Retry join LAN is supported for: aws azure gce softlayer
2017/09/06 13:30:24 [INFO] agent: Joining LAN cluster...
2017/09/06 13:30:24 [INFO] agent: (LAN) joining: [10.37.84.6 10.36.180.8 10.33.92.6]
2017/09/06 13:30:24 [INFO] consul: Adding LAN server consul-test-0 (Addr: tcp/10.37.84.6:8300) (DC: dc1)
2017/09/06 13:30:24 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
2017/09/06 13:30:24 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
2017/09/06 13:30:24 [INFO] agent: Started HTTP server on [::]:8500
2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-2 10.33.92.6
2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-1 10.36.180.8
2017/09/06 13:30:24 [INFO] consul: Adding LAN server consul-test-2 (Addr: tcp/10.33.92.6:8300) (DC: dc1)
2017/09/06 13:30:24 [INFO] consul: Adding LAN server consul-test-1 (Addr: tcp/10.36.180.8:8300) (DC: dc1)
2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-2.dc1 10.33.92.6
2017/09/06 13:30:24 [INFO] serf: EventMemberJoin: consul-test-1.dc1 10.36.180.8
2017/09/06 13:30:24 [INFO] consul: Handled member-join event for server "consul-test-2.dc1" in area "wan"
2017/09/06 13:30:24 [INFO] consul: Handled member-join event for server "consul-test-1.dc1" in area "wan"
2017/09/06 13:30:24 [INFO] agent: (LAN) joined: 3 Err: <nil>
2017/09/06 13:30:24 [INFO] agent: Join LAN completed. Synced with 3 initial agents
2017/09/06 13:30:30 [WARN] raft: no known peers, aborting election
2017/09/06 13:30:31 [ERR] agent: failed to sync remote state: No cluster leader
Also, @preetapan, quoting the 3
for raft_protocol
in server.json
causes an error:
[consul-test-0] * 'raft_protocol' expected type 'int', got unconvertible type 'string'
@erkolson that was a typo, edited it to fix now.
can you try adding bootstrap-expect=3 when you start consul? Here's my orchestration script that uses docker where I tested terminating all servers and starting them back up with new ips
@erkolson I think you also need to set bootstrap_expect
in https://github.com/erkolson/consul-v0.9.3-rc1-test/blob/master/manifests/consul-test-config.yaml to the number of servers you are running to get the cluster to initially bootstrap.
Thanks, I added boostrap-expect
to the exec command and the cluster initializes. It took a while to figure out how to recreate the pods with new IP addresses...
This is the initial cluster:
Node ID Address State Voter RaftProtocol
consul-test-1 7931eb2f-3e44-831e-acff-d8345ad345ae 10.36.180.8:8300 leader true 3
consul-test-0 b84a7750-dcdb-c63a-1ae8-2ef036731c81 10.37.84.6:8300 follower true 3
consul-test-2 29f263d1-e7b5-e905-13b1-931f7968cb3e 10.33.92.6:8300 follower true 3
After getting the pods to start with new IPs, I see this:
Node ID Address State Voter RaftProtocol
(unknown) b84a7750-dcdb-c63a-1ae8-2ef036731c81 10.37.84.6:8300 follower true <=1
consul-test-2 29f263d1-e7b5-e905-13b1-931f7968cb3e 10.33.92.13:8300 follower true 3
consul-test-1 7931eb2f-3e44-831e-acff-d8345ad345ae 10.37.92.8:8300 follower true 3
The data is still there, consul kg get -recurse
shows the keys I set prior to restarting but the previous IP address of the consul-test-0
pod did not get updated. These are the current pod addresses:
NAME READY STATUS RESTARTS AGE IP
consul-test-0 1/1 Running 0 7m 10.36.204.14
consul-test-1 1/1 Running 0 7m 10.37.92.8
consul-test-2 1/1 Running 0 7m 10.33.92.13
@erkolson Is the cluster operational otherwise, and you are able to use it for service registeration/KV writes etc?
The wrong IP address issue you mentioned above might be a temporary sync issue that affects the output of consul operator raft list-peers
till the leader does a reconcile step where it fixes what it displays above. I will have to test it out some more to confirm though.
Indeed, consul kv get
and put
are working.
I'll leave it running for a bit longer to see if the peers list reconciles. So far, ~30 minutes, no change. consul members
does show the correct IP though.
Although at the moment I have no logs to show for it, I had the exact same problem when running the master branch.
@erkolson Do you mind trying the same test with 5 instead of 3 servers? I have a fix in the works for making this work, the root cause is that autopilot will not do the config fix for the server with the wrong IP because that's going to cause it to lose quorum.
Please let me know if you still see the problem with 5 servers..
@preetapan, I ran the test again with 5 servers and this time consul-node-0
was updated
Initial cluster:
Node ID Address State Voter RaftProtocol
consul-test-1 964b92b9-0ac2-56af-9db9-d30771155c66 10.38.124.6:8300 leader true 3
consul-test-4 22dd0f0a-a2e7-48d4-d4bb-33726cae71de 10.37.92.7:8300 follower true 3
consul-test-0 3c1fc748-b5d5-684d-2b73-cc08ce72be6d 10.37.84.4:8300 follower true 3
consul-test-3 0b9704ae-460a-762f-6c83-19c644899cf6 10.33.58.8:8300 follower true 3
consul-test-2 1a1046e6-c627-e40d-8108-320fcd818a3e 10.45.124.12:8300 follower true 3
Intermediate step after pods recreated:
Node ID Address State Voter RaftProtocol
(unknown) 1a1046e6-c627-e40d-8108-320fcd818a3e 10.45.124.12:8300 follower true <=1
consul-test-4 22dd0f0a-a2e7-48d4-d4bb-33726cae71de 10.37.92.8:8300 follower false 3
consul-test-3 0b9704ae-460a-762f-6c83-19c644899cf6 10.36.204.13:8300 follower false 3
consul-test-0 3c1fc748-b5d5-684d-2b73-cc08ce72be6d 10.37.84.5:8300 follower false 3
consul-test-1 964b92b9-0ac2-56af-9db9-d30771155c66 10.38.116.10:8300 follower false 3
And finally, ~40s after startup:
Node ID Address State Voter RaftProtocol
consul-test-4 22dd0f0a-a2e7-48d4-d4bb-33726cae71de 10.37.92.8:8300 follower true 3
consul-test-3 0b9704ae-460a-762f-6c83-19c644899cf6 10.36.204.13:8300 follower true 3
consul-test-0 3c1fc748-b5d5-684d-2b73-cc08ce72be6d 10.37.84.5:8300 leader true 3
consul-test-1 964b92b9-0ac2-56af-9db9-d30771155c66 10.38.116.10:8300 follower true 3
consul-test-2 1a1046e6-c627-e40d-8108-320fcd818a3e 10.36.168.4:8300 follower false 3
Looks good!
@erkolson Thanks for your help in testing this, we really appreciate it!
We definitely appreciate all the help testing this. We cut a build with the fix @preetapan added via #3450 in https://releases.hashicorp.com/consul/0.9.3-rc2/. If you can give that a look please let us know if you see any remaining issues.
I tested again with 3 nodes and rc2. This time it took ~2 min after startup with new IPs for the peers list to reconcile but all seems to be working.
You are welcome for the help, I'm happy to see this functionality. I have experienced first hand all consul pods being rescheduled simultaneously a couple months ago :-)
On 0.9.3. Still seems to have cluster leader problem.
Thought this was captured but couldn't find an existing issue for this. Here's a discussion - https://groups.google.com/d/msgid/consul-tool/623398ba-1dee-4851-85a2-221ff539c355%40googlegroups.com?utm_medium=email&utm_source=footer. For servers we'd also need to address https://github.com/hashicorp/consul/issues/457.
We are going to close other IP-related issues against this one to keep everything together. The Raft side should support this once you get to Raft protocol version 3, but we need to do testing and will likely have to burn down some small issues to complete this.