hashicorp / terraform-aws-consul

A Terraform Module for how to run Consul on AWS using Terraform and Packer
Apache License 2.0
401 stars 488 forks source link

Set autorestart to 'unexpected' so that attempts to gracefully shutdown do not trigger restarts #107

Closed coryflucas closed 5 years ago

coryflucas commented 5 years ago

Fixes #108. This allows removing servers gracefully via the /agent/leave API endpoint for upgrades as suggested in the documentation here: https://github.com/hashicorp/terraform-aws-consul/tree/master/modules/consul-cluster#how-do-you-roll-out-updates

Currently invoking the leave endpoint causes consul to stop and then supervisor immediately restarts it and it will rejoin the cluster. This makes performing a graceful shutdown impossible without shell access to the host to stop supervisor directly.

genert commented 5 years ago

Good catch! Sounds reasonable.

coryflucas commented 5 years ago

I tested this by spinning up a cluster with the new code and confirmed that hitting the agent/leave endpoint results in the target host no longer showing as a member of the cluster for the other agents, and that trying to hit the API again on the target host results in a connection refused:

[ec2-user@ip-10-1-1-250 ~]$ TARGET=10.1.12.161
[ec2-user@ip-10-1-1-250 ~]$ OTHER=10.1.10.62
[ec2-user@ip-10-1-1-250 ~]$ curl -w "\n" http://$OTHER:8500/v1/status/peers
["10.1.12.161:8300","10.1.11.254:8300","10.1.10.62:8300"]
[ec2-user@ip-10-1-1-250 ~]$ curl -w "\n" http://$TARGET:8500/v1/status/peers
["10.1.12.161:8300","10.1.11.254:8300","10.1.10.62:8300"]
[ec2-user@ip-10-1-1-250 ~]$ curl -X PUT http://$TARGET:8500/v1/agent/leave
[ec2-user@ip-10-1-1-250 ~]$ curl -w "\n" http://$OTHER:8500/v1/status/peers
["10.1.11.254:8300","10.1.10.62:8300"]
[ec2-user@ip-10-1-1-250 ~]$ curl http://$TARGET:8500/v1/status/peers
curl: (7) Failed to connect to 10.1.12.161 port 8500: Connection refused

Prior to this change, the last two commands would show the target host had re-joined the cluster and restarted the API.

brikis98 commented 5 years ago

Great, thanks! I'll merge this now and let the tests run. If they pass, I'll create a new release and share the link.

coryflucas commented 5 years ago

Thanks! And thanks for putting this project together!

brikis98 commented 5 years ago

@Etiene The tests failed on this PR, but I just noticed they have been failing since #177 was merged. Could you look into it?