hashicorp / terraform-aws-consul

A Terraform Module for how to run Consul on AWS using Terraform and Packer
Apache License 2.0
401 stars 484 forks source link

How-to : Wait for Consul Cluster up and healthy (LB with ASG) #158

Closed LeComptoirDesPharmacies closed 4 years ago

LeComptoirDesPharmacies commented 4 years ago

Hi,

We would like to use an AWS lambda to initialize/update the Configuration Entries of the Consul cluster (router, splitter, ...) created with your module. Thus, it should be run once your module ended and when the consul cluster is healthy. Did you get any tips in order to invoke the lambda at the perfect moment ?

On our side, we saw that "aws_autoscaling_group" resource allow to specify the "wait_for_elb_capacity" attribute in order to wait healthiness of ASG targets before considering the ASG resource as created.

However there is two problems :

  1. Is a healthy NLB TCP Health Check on port 8500 enough to consider Consul cluster able to receive Config Entries configuration request ?
  2. The "wait_for_elb_capacity" attribute is not set nor overridable in your module.

Thanks in advance for your advices/answers. Yours faithfully, LCDP

brikis98 commented 4 years ago

I don't think Consul listening on port 8500 would be enough to determine if the cluster is healthy and has established a quorum. There are other endpoints in the HTTP API you could try, but I'm not sure any of those are designed for an ELB-style use case. See also https://github.com/hashicorp/consul/issues/1468, which is requesting something similar.

Two possible ideas:

  1. Add your own HTTP endpoint on each Consul node specifically for this task. See also https://github.com/kadaan/consulate.
  2. Write a separate script that checks the cluster's health and deploys the Lambda function when ready.
LeComptoirDesPharmacies commented 4 years ago

Hi @brikis98,

Thank you for this complete answer, your proposals seems really interesting to look into !

Currently, we created a lambda which is triggered by S3 notification event (through SNS), read content of the S3 (Consul Json Config) and update the Consul configuration with (API REST). As SNS lambda invocation is asynchronous, there is a 3 time retry which is sufficent (yet) for Consul Cluster to start.

If I have extra time in the future, I will look into Consulate first, thank you ! Yours faithfully, LCDP