hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.85k stars 9.2k forks source link

[Bug]: PrefixedUniqueId collisions with parallel runs #39625

Open aardon-dku opened 1 month ago

aardon-dku commented 1 month ago

Terraform Core Version

1.9.7

AWS Provider Version

5.70.0

Affected Resource(s)

Every resource using the clientToken parameter for ensuring request idempotence. Possible other collisions

Expected Behavior

Running multiple instances of terraform in parallel with the same template but different states should succeed

Actual Behavior

ClientToken generation collisions if the generation happens at the exact same time. 2 things can happen:

This problem affects our internal processes and is the source of unpredictable failures or missing resources.

Relevant Error/Panic Output Snippet

status code: 400, request id: ece0612a-0ca8-4a30-b5b6-413f45ee5b32
Error: creating EC2 Instance: IdempotentParameterMismatch: Arguments on this idempotent request are inconsistent with arguments used in previous request(s).

Terraform Configuration Files

provider "aws" {
  region = "eu-west-1"
}

resource "aws_instance" "instance" {
  ami                         = "ami-054a53dca63de757b" #al2023
  availability_zone           = "eu-west-1a"
  instance_type               = "t3.nano"
  associate_public_ip_address = false
}

Steps to Reproduce

The issue is hard to reproduce on a regular basis due to 10e-4 precision of the timestamp used to generate the token.

  1. Create subdirectories named 1 ... to 8 in test-terraform-parallel directory containing each the terraform file
  2. Run the python script
    
    import subprocess
    import threading
    import pause
    import time

Define the two commands

cmd = ["terraform", "apply", "-auto-approve"] until = time.time() + 5 # Start in 5 seconds

def tf_apply(folder): pause.until(until) subprocess.Popen(cmd, cwd=f"test-terraform-parallel/{folder}")

threads = list() for i in range(1, 9): t = threading.Thread(target=tf_apply, args=(i,)) t.start() threads.append(t)

for t in threads: t.join()


3. Delete the instances

### Debug Output

While inspecting the cloudtrail logs from AWS, the EC2 RunInstance yielded the following clientTokens:

terraform-20241008121721473800000001 terraform-20241008121721723800000001 # identical terraform-20241008121721723800000001 # identical terraform-20241008121722129600000001 terraform-20241008121722310000000001 terraform-20241008121722393600000001 terraform-20241008121722461700000001 terraform-20241008121722556000000001

As 2 clientToken were identical, only 7 instances were created instead of the 8 required.

Output:

aws_instance.instance: Creating... aws_instance.instance: Creating... aws_instance.instance: Creating... aws_instance.instance: Creating... aws_instance.instance: Creating... aws_instance.instance: Creating... aws_instance.instance: Creating... aws_instance.instance: Creating... aws_instance.instance: Still creating... [10s elapsed] aws_instance.instance: Still creating... [10s elapsed] aws_instance.instance: Still creating... [10s elapsed] aws_instance.instance: Still creating... [10s elapsed] aws_instance.instance: Still creating... [10s elapsed] aws_instance.instance: Still creating... [10s elapsed] aws_instance.instance: Still creating... [10s elapsed] aws_instance.instance: Still creating... [10s elapsed] aws_instance.instance: Creation complete after 13s [id=i-04f2b5129431ab300] aws_instance.instance: Creation complete after 13s [id=i-0fab6490dbad5606b] aws_instance.instance: Creation complete after 12s [id=i-07c197320e68beee7] aws_instance.instance: Creation complete after 13s [id=i-0059a70dfa8bb26be] aws_instance.instance: Creation complete after 13s [id=i-0b01d3cde6114c063] aws_instance.instance: Creation complete after 13s [id=i-065f63c17bad849c9] # identical aws_instance.instance: Creation complete after 13s [id=i-04e5587f935515511] aws_instance.instance: Creation complete after 13s [id=i-065f63c17bad849c9] # identical



2 of the terraform states target the same instance (i-065f63c17bad849c9)

### Panic Output

_No response_

### Important Factoids

Suggested fixes:
- Add a random seed in the `PrefixedUniqueId` token generation.
- Make the client token prefix configurable

### References

_No response_

### Would you like to implement a fix?

Yes
github-actions[bot] commented 1 month ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue