hashicorp / waypoint

A tool to build, deploy, and release any application on any platform.
https://waypointproject.io
Other
4.76k stars 327 forks source link

False negative ECS deploy if remote runner lacks permissions #2276

Open catsby opened 3 years ago

catsby commented 3 years ago

https://github.com/hashicorp/waypoint/pull/2220 introduces remote runners for ECS Server deploys.

While testing, I noticed that if the On-Demand Runner (ODR) does not have the required permissions to run the follow up status reports, the up operation appears to error, however the deploy does actually seem to work.

Example output (requires #2220):

» Deploying...
✓ Deployment resources created
✓ Discovered alb subnets
✓ Discovered service subnets
✓ Using external security group example-nodejs-inbound
✓ Using internal security group example-nodejs-inbound-internal
✓ Using Application Load Balancer "waypoint-ecs-example-nodejs"
✓ Created target group example-nodejs-01FF2TJKJ9XW1BTD2
✓ Created ALB Listener
✓ Using existing log group waypoint-logs
✓ Using existing execution IAM role "ecr-example-nodejs"
✓ Registered Task definition: waypoint-example-nodejs
✓ Created ECS cluster: waypoint
✓ Created ECS Service example-nodejs-01FF2TJKJ9XW1BTD2

❌ Gathering health report for ecs deployment...
❌ Determining status of ecs service example-nodejs-01FF2TJKJ9XW1BTD2
! resource manager failed to generate a status report: failed generating resource
  statuses: rpc error: code = Internal desc = failed to describe service (ARN
  "arn:aws:ecs:us-west-2:797645259670:service/waypoint/example-nodejs-01FF2TJKJ9XW1BTD2"):
  AccessDeniedException: User:
  arn:aws:sts::797645259670:assumed-role/waypoint-runner/7ccdfc2f41574bd1b5020447ea46847e
  is not authorized to perform: ecs:DescribeServices on resource:
  arn:aws:ecs:us-west-2:797645259670:service/waypoint/example-nodejs-01FF2TJKJ9XW1BTD2

In the runner code, we return errors if the status report errors (see https://github.com/hashicorp/waypoint/blob/56578b32ee0e168ab64657a6c50b0bf9285ff014/internal/runner/operation_up.go#L77-L79). I think we should consider changing it to be more tolerant and differentiate between failing an up operation vs failing a status operation.

evanphx commented 3 years ago

Per team discussion, ECS deploy should actually poll ECS to see that the service status indicates it's ready.