hashicorp / terraform-aws-consul

A Terraform Module for how to run Consul on AWS using Terraform and Packer
Apache License 2.0
401 stars 487 forks source link

Can't trigger an instance-refresh on update #230

Open mr-miles opened 3 years ago

mr-miles commented 3 years ago

The module doesn't expose the instance refresh properties of the underlying auto-scaling group - so it isn't possible to trigger an instance update as part of a deployment.

I made #229 which adds the necessary properties/docs to achieve this

brikis98 commented 3 years ago

Sorry for the delay. One issue here is whether instance refresh would work correctly and do the update without causing an outage / loss of quorum. See here for steps that you need to update the cluster. Have you actually tested it and checked there's no downtime / loss of quorum during the rollout?

mr-miles commented 3 years ago

The instance refresh basically automates the steps you linked to. I've been using it for a few months now and it rotates instances solidly. On occasions when the instances haven't launched due to my misconfiguration, it hasn't torn down further instances and has behaved well.

One of the parameters of the refresh is the minimum healthy percentage, which limits the number of instances it'll recycle at any one time. I have set it so that no more than one instance at a time is taken out, but there's nothing to stop the user setting it to a quorum-killing level! I also set it to leave 5 minutes after a successful recycle before moving onto the next.

I could add validation so it errors if it is set too high? Maybe I should also add a "take care" note to the parameter since I think you can't rule out using it to shoot yourself in the foot.

Miles

On Thu, Jul 29, 2021 at 5:31 PM Yevgeniy Brikman @.***> wrote:

Sorry for the delay. One issue here is whether instance refresh would work correctly and do the update without causing an outage / loss of quorum. See here for steps that you need to update the cluster https://github.com/hashicorp/terraform-aws-consul/tree/master/modules/consul-cluster#how-do-you-roll-out-updates. Have you actually tested it and checked there's no downtime / loss of quorum during the rollout?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-aws-consul/issues/230#issuecomment-889292408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQD4FBGDEB3NT2HVXLB3LT2F67BANCNFSM47HUQSTQ .