30s delay for awsx.ecs.Fargate startup due to "error ECS was unable to assume the role"

pulumi / pulumi-awsx

AWS infrastructure best practices in component form!

Apache License 2.0

221 stars 104 forks source link

In the last few awsx.ecs.FargateServices I've created, I've seen this in the ECS Service event log:

2022-10-13 16:34:12 -0700 service service-c8e196b has started 1 tasks: task 02e5835a0e294fb3864272e1e8e8e8ed.

2022-10-13 16:33:43 -0700 service service-c8e196b failed to launch a task with (error ECS was unable to assume the role 'arn:aws:iam::111111111111:role/service-task-6cba4ed' that was provided for this task. Please verify that the role being passed has the proper trust relationship and permissions and that your IAM user has permissions to pass this role.).

I don't recall ever seeing this with the classic AWSX provider. Is it possible we are not making the Service dependent on a policy being attached to the service role, such that the first attempt to do this fails? It appears this causes it to wait an additional 30s to retry, which materially increases the time to ready for the end to end deployment (4m21s vs. presumably 3m51s without this).

pulumi / pulumi-awsx

30s delay for awsx.ecs.Fargate startup due to "error ECS was unable to assume the role" #927