Open Veetaha opened 1 year ago
Voting for Prioritization
Volunteering to Work on This Issue
Today (2023-07-04) I faced the same failure of bucket deployment in us-west-2
If you face this issue, check your CloudTrail logs, maybe you are creating another resource in your Terraform code that will automatically create the S3 bucket for you.
I ran into this today while creating a aws_athena_database
resource and a aws_s3_bucket
resource alongside to store the Athena results. Based on the CloudTrail logs, Terraform created the Athena database first, then I guess the underlying AWS API detected that the S3 bucket I specified to store the Athena results did not exist and created it for me (the first CreateBucket event has this source: "invokedBy": "athena.amazonaws.com"
).
So a few milliseconds later, when Terraform tried to create the bucket, it said I already own it and failed. I then needed to import the bucket in my state and apply again to finish configuring it...
In my case, I fixed it by adding depends_on = [aws_s3_bucket.athena-bucket]
to my aws_athena_database
resource.
Terraform Core Version
1.3.7
AWS Provider Version
4.17.1
Affected Resource(s)
Expected Behavior
The S3 bucket must be created successfully if it uses a unique name never ever used before.
Actual Behavior
Terraform may randomly fail to create the bucket.
Relevant Error/Panic Output Snippet
Terraform Configuration Files
Steps to Reproduce
There isn't a stable reproduction. It happens randomly and rarely during
terraform apply
. I suppose if you runterraform apply
bazillion times, you may be lucky to catch this error. More on "bazillion" below.We run
terraform apply
that creates buckets on CI extensively in our tests. Each test generates a unique bucket name, so there should not be a problem with reusing a bucket name. I also checked the regions where we deploy our buckets with the unique names (you may get the idea of the bucket naming pattern we use, but that's not very relevant). No region has ever seen the creation of the bucket name specified in the error message for at least the last 3 months. I'd say it's statistically improbable we could ever reuse a bucket name during our deployments.However, I found 4 buckets leaked due to this error on the following dates:
2022-12-08
2023-01-17
2023-01-19
2023-01-21
I don't know why this error started appearing more often for the last week, but it started to hurt us, so we are reporting it. I wouldn't err on our code doing something wrong because Cloudtrail logs don't indicate that
Debug Output
Unfortunately, we don't run terraform with
TF_LOG=debug
, so we don't have debug logs.Panic Output
No response
Important Factoids
During my investigation on the problem I took a look at CloudTrail logs, and found that each time we get this
BucketAlreadyOwnedByYou
there are twoCreateBucket
API calls invoked within the same second by terraform.I am entirely sure it happens within the same terraform process on the same machine, so I am sure we can exclude concurrent deployments of the same stack from the potential causes.
Unfortunately, CloudTrail's date-recording granularity is 1 second, so I can't tell for sure which of the two API calls was made first, but I think the order is obvious when we take a look at the CloudTrail events themselves. I am pasting them in the order that I guess they happened:
This first API call for some reason got
OperationAborted
complaining about a conflicting concurrent operation running for the bucket. However, this API did successfully create a bucket (I will prove that a bit later below), and therefore the second API call includes theBucketAlreadyOwnedByYou
error:So the first API call did succeed to create a bucket, even if it returned an
OperationAborted
error. The fact that the bucket does exist (although it has no tags), and its creation date is one second after the API calls described higher proves my thinking:I think this may be the case where this is AWS fault, where they actually succeed at creating a bucket, but return an error to us. However, AWS has a ton of bugs in their APIs, and I think terraform should be capable of working around them for all of terraform's users. I think the workaround from terraform's side would be to detect this particular error of
OperationAborted
and to check if the bucket was created when this error is returned. If it is, then we consider the bucket as created successfully.References
No response
Would you like to implement a fix?
No