hashicorp / terraform-aws-consul-ecs

Consul Service Mesh on AWS ECS (Elastic Container Service)
https://www.consul.io/docs/ecs
Mozilla Public License 2.0
52 stars 31 forks source link

Fix health check command for mesh and gateway task submodules #241

Closed Ganeshrockz closed 9 months ago

Ganeshrockz commented 9 months ago

Changes proposed in this PR:

ganeshseetharaman@ganeshseetharaman-DFCQN60JDW % curl -v httpbin.org/status/500 
*   Trying 3.95.102.170:80...
* Connected to httpbin.org (3.95.102.170) port 80 (#0)
> GET /status/500 HTTP/1.1
> Host: httpbin.org
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/1.1 500 INTERNAL SERVER ERROR
< Date: Fri, 15 Dec 2023 08:26:54 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 0
< Connection: keep-alive
< Server: gunicorn/19.9.0
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
< 
* Connection #0 to host httpbin.org left intact
ganeshseetharaman@ganeshseetharaman-DFCQN60JDW % echo $?                                                          
0

This caused the consul-dataplane container to start up before the consul binary became available in the shared volume. In an ideal world, consul-dataplane should only start up after the control-plane container writes the Consul ECS binary to the shared volume and returns back a 200 response code for the /consul-ecs/health endpoint.

This PR fixes the same by adding a -f flag to curl.

ganeshseetharaman@ganeshseetharaman-DFCQN60JDW % curl -v -f httpbin.org/status/500
*   Trying 18.214.18.233:80...
* Connected to httpbin.org (18.214.18.233) port 80 (#0)
> GET /status/500 HTTP/1.1
> Host: httpbin.org
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/1.1 500 INTERNAL SERVER ERROR
< Date: Fri, 15 Dec 2023 08:30:02 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 0
< Connection: keep-alive
< Server: gunicorn/19.9.0
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
* The requested URL returned error: 500
* Closing connection 0
curl: (22) The requested URL returned error: 500
ganeshseetharaman@ganeshseetharaman-DFCQN60JDW % echo $?                          
22

How I've tested this PR:

Manual deployment

How I expect reviewers to test this PR:

Checklist:

lkysow commented 9 months ago

Happy to merge this quickly but we need to look into why the acceptance tests didn't catch this.

Ganeshrockz commented 9 months ago

Happy to merge this quickly but we need to look into why the acceptance tests didn't catch this.

I am pretty sure something changed recently in ECS. I have never seen a single acceptance test run fail with this error until last week. Right now I am pretty much able to reproduce this everytime I trigger the same suite of acceptance tests. I am also constantly seeing this error whenever trying to deploy the examples present in this repo.