sst / sst

Build full-stack apps on your own infrastructure.
https://sst.dev
MIT License
24.05k stars 1.85k forks source link

ECS Service doesn't configure custom health check in Cloud Map resulting in traffic being routed to unhealthy tasks #5915

Open Probotect0r opened 3 weeks ago

Probotect0r commented 3 weeks ago

I am deploying an ECS Service using SST and am using an API Gateway private integration with Cloud Map to route to the ECS tasks.

I have been noticing intermittent 503s from my API Gateway when I run a load test or immediately after a deployment for a minute or so (the time it takes for the new task to spin up). I noticed that the Cloud Map Service does not have a health check configured, and the health check status for my registered service instances is set to "Unknown". According to the docs, if there is no health check configured, Cloud Map will route to every service instance regardless of it's health (https://docs.aws.amazon.com/cloud-map/latest/dg/services-health-checks.html).

If you don't configure a health check during service creation, traffic will be routed to service instances regardless of the instances' health status.

For ECS Tasks that are connected to Cloud Map, you must set the health check type in the Cloud Map service to "Custom Health Check". ECS will then take care of submitting the status of each task instance to Cloud Map, and Cloud Map will only begin routing traffic to this instance once it is reported to be healthy by ECS (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html).

We recommend you use container-level health checks managed by Amazon ECS for your service discovery service. HealthCheckCustomConfig—Amazon ECS manages health checks on your behalf. Amazon ECS uses information from container and health checks, and your task state, to update the health with AWS Cloud Map. This is specified using the --health-check-custom-config parameter when creating your service discovery service. For more information, see HealthCheckCustomConfig in the AWS Cloud Map API Reference.

The SST Service seem to configure the Cloud Map Service health check to No health check by default, and there doesn't seem to be any way to override it. I couldn't find any fields in the transform block of the Service construct either.

The health check config needs to be added here: https://github.com/sst/sst/blob/dev/platform/src/components/aws/service.ts#L2144https://github.com/sst/sst/blob/dev/platform/src/components/aws/service.ts#L2144

I modified this file locally and ran the deployment and the health check is working correctly. I can submit a PR for this change, but might need some pointers on how to expose the cloudmap service for override in the transform block.