Closed mheffner closed 9 months ago
We are having the same issue. It occurs always on the initial stack creation. Re-running always helps, but this is still annoying
You can add an explicit wait in your local Terraform code:
resource "aws_iam_role" "this" {
# ...
}
resource "time_sleep" "wait_for_iam_eventual_consistency" {
depends_on = [aws_iam_role.this]
create_duration = "15s"
}
resource "spacelift_aws_integration" "this" {
# ...
depends_on = [time_sleep.wait_for_iam_eventual_consistency]
}
I've found 15 secs is typically enough in our environment.
The error message you need to configure trust relationship section in your AWS account
comes from the Spacelift API. It should probably mention configuring the trust relationship section in your AWS role, rather than account.
Thanks for reporting this, a sleep is for sure a great way to work around this issue. Especially to @mheffner who pointed out the retry loop not working! this really helped us identify the cause here.
Internally that we updated the error message, to help clarify what action needs to be taken here by users of spacelift. However we've missed the provider handling this specific error in it's retry logic.
We changed from:
could not assume the AWS IAM role with external ID
to
you need to configure trust relationship section in your AWS account
Because of this, the retry logic is not being hit. I'll put in a pull request to handle this error and then the retry loop should work again.
When I'm attempting to attach an AWS integration (
spacelift_aws_integration_attachment
) to a stack, it is failing with:It appears to fail immediately. If I look at the code, it looks like it should enter a back-off loop, but I'm not seeing it hit the sleep wait here. My guess is it is hitting that error code string check and bailing early. https://github.com/spacelift-io/terraform-provider-spacelift/blob/4d3a0727330fbfc84c3e1a3aaf02dc820b8c23ae/spacelift/resource_aws_integration_attachment.go#L80-L88
I was able to get it to work by immediately re-running the
apply
. This seems to imply the failure was timing based and not a configuration error.