hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.1k forks source link

[Bug]: Creating a duplicate ECR repository policy causes irrecoverable validation error #39202

Open raxod502-plaid opened 1 week ago

raxod502-plaid commented 1 week ago

Terraform Core Version

1.3.3, 1.7.5

AWS Provider Version

5.66.0

Affected Resource(s)

Expected Behavior

If you create two aws_ecr_repository_policy objects for the same policy this is user error, but the expected result is that the only thing that should get broken is your repository policy getting set to the wrong value and/or your terraform plan failing until you fix the code.

Actual Behavior

If you do things in a certain order, it's possible to get the terraform state into a broken place where absolutely no terraform operations can be run without manually running terraform state rm commands.

Relevant Error/Panic Output Snippet

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: parsing policy 1: parsing statement 1: 1 error(s) decoding:
│
│ * '[0].Condition' expected a map, got 'slice'
│
│   with aws_ecr_repository_policy.conflicting_b,
│   on infra.tf line 32, in resource "aws_ecr_repository_policy" "conflicting_b":
│   32: resource "aws_ecr_repository_policy" "conflicting_b" {

Terraform Configuration Files

Start by applying this:

resource "aws_ecr_repository" "example" {
  name = "rrosborough-cursed-test"
  tags = {
    team           = "infra"
    safe-to-delete = "true"
    who-to-blame   = "rrosborough"
  }
}

data "aws_iam_policy_document" "example" {
  statement {
    sid    = "allow-pull"
    effect = "Allow"

    principals {
      type        = "AWS"
      identifiers = ["763104351884"]
    }

    actions = [
      "ecr:BatchGetImage",
      "ecr:GetDownloadUrlForLayer",
    ]
  }
}

resource "aws_ecr_repository_policy" "conflicting_a" {
  repository = aws_ecr_repository.example.name
  policy     = data.aws_iam_policy_document.example.json
}

Now add the following resource and apply:

resource "aws_ecr_repository_policy" "conflicting_b" {
  repository = aws_ecr_repository.example.name

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "LambdaECRImageRetrievalPolicy"
        Effect = "Allow"
        Principal = {
          Service = "lambda.amazonaws.com"
        }
        Action = [
          "ecr:BatchGetImage",
          "ecr:GetDownloadUrlForLayer",
          "ecr:BatchGetImage",
          "ecr:DeleteRepositoryPolicy",
          "ecr:GetDownloadUrlForLayer",
          "ecr:GetRepositoryPolicy",
          "ecr:SetRepositoryPolicy"
        ]
        Condition = {
          StringLike = {
            "aws:sourceArn" = "arn:aws:lambda:us-east-1:763104351884:function:*"
          }
        }
      }
    ]
  })
}

Finally edit the added resource like so and try applying:

resource "aws_ecr_repository_policy" "conflicting_b" {
  repository = aws_ecr_repository.example.name

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = concat(
      data.aws_iam_policy_document.example.statement,
      [
        {
          Sid    = "LambdaECRImageRetrievalPolicy"
          Effect = "Allow"
          Principal = {
            Service = "lambda.amazonaws.com"
          }
          Action = [
            "ecr:BatchGetImage",
            "ecr:DeleteRepositoryPolicy",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetRepositoryPolicy",
            "ecr:SetRepositoryPolicy"
          ]
          Condition = {
            StringLike = {
              "aws:sourceArn" = "arn:aws:lambda:us-east-1:763104351884:function:*"
            }
          }
        }
      ]
    )
  })
}

Steps to Reproduce

Do the steps above. Applying the first code block should succeed. Applying the second code block should also succeed although now there will be a permanent diff because there are two different TF references to the same repository policy, so the one that applied from the second code block will have overwritten the one from the first code block. Applying the third code block hangs for 2 minutes and then aborts with:

│ Error: putting ECR Repository Policy (rrosborough-cursed-test): operation error ECR: SetRepositoryPolicy, https response error StatusCode: 400, RequestID: e0d3042f-0892-4e1e-924b-802de641f01c, InvalidParameterException: Invalid parameter at 'PolicyText' failed to satisfy constraint: 'Invalid repository policy provided'
│
│   with aws_ecr_repository_policy.conflicting_b,
│   on infra.tf line 32, in resource "aws_ecr_repository_policy" "conflicting_b":
│   32: resource "aws_ecr_repository_policy" "conflicting_b" {

That is expected because the new repository policy is invalid (it's constructed in a way that gives an invalid value for the Condition key). But, what's not expected is that now ALL future Terraform commands fail with this:

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: parsing policy 1: parsing statement 1: 1 error(s) decoding:
│
│ * '[0].Condition' expected a map, got 'slice'
│
│   with aws_ecr_repository_policy.conflicting_b,
│   on infra.tf line 32, in resource "aws_ecr_repository_policy" "conflicting_b":
│   32: resource "aws_ecr_repository_policy" "conflicting_b" {

You can remove the broken resources from Terraform source code but it does not help, since somehow Terraform has put an object into the Terraform state that was not applied to AWS and does not pass the provider's own validation checks. The only way to get out of this state is terraform state rm.

(It might look like removing the resource on line 32 would resolve the error, since the error message points to it explicitly... humorously enough, removing the resource just make Terraform print the same error except now with no line of code or resource pointer.)

Debug Output

No response

Panic Output

No response

Important Factoids

The code in this issue is obviously incorrect and should be rejected, everyone on our end is on the same page about that, the issue though is that the provider gets into a permanently broken state where administrative intervention is needed, instead of just failing the apply like it should (or producing an incorrect result).

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 week ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue