pulumi / examples

Infrastructure, containers, and serverless apps to AWS, Azure, GCP, and Kubernetes... all deployed with Pulumi
https://www.pulumi.com
Apache License 2.0
2.37k stars 877 forks source link

AWSX container tests unreliable #449

Closed lukehoban closed 4 years ago

lukehoban commented 4 years ago

The TestAccAwsJsContainers and TestAccAwsTsContainers have been pretty unreliable - they each fail one of every 4-5 runs.

They fail with:

aws:ecs:Service nginx deleting error: Plan apply failed: deleting urn:pulumi:p-it-travis-job-aws-js-con-6361d014::container-quickstart::awsx:x:ecs:FargateService$aws:ecs/service:Service::nginx: timeout while waiting for state to become 'INACTIVE' (last state: 'DRAINING', timeout: 10m0s)

It feels like the 10m timeout is possibly insufficient (which is rather surprising/bad from an AWS perspective). It's unclear whether there is something specific to the workload/test that causes this, or whether the 10m default is itself not right.

It may also be that this is due to the LoadBalancer - and may be that deregistrationDelay needs to be defaulted to something different.

Most likely we don't want to change the test itself (if doing so would make this "worse" of an example) - and instead want to update some defaults in either AWS or AWSX.

stack72 commented 4 years ago

@lukehoban I don't think these have been as flaky recently - thoughts?

lukehoban commented 4 years ago

Agreed - these have been much more reliable recently. Closing this out for now.