hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.83k stars 9.19k forks source link

[Bug]: Creating a duplicate ECR repository policy causes irrecoverable validation error #39202

Closed raxod502-plaid closed 3 weeks ago

raxod502-plaid commented 2 months ago

Terraform Core Version

1.3.3, 1.7.5

AWS Provider Version

5.66.0

Affected Resource(s)

Expected Behavior

If you create two aws_ecr_repository_policy objects for the same policy this is user error, but the expected result is that the only thing that should get broken is your repository policy getting set to the wrong value and/or your terraform plan failing until you fix the code.

Actual Behavior

If you do things in a certain order, it's possible to get the terraform state into a broken place where absolutely no terraform operations can be run without manually running terraform state rm commands.

Relevant Error/Panic Output Snippet

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: parsing policy 1: parsing statement 1: 1 error(s) decoding:
│
│ * '[0].Condition' expected a map, got 'slice'
│
│   with aws_ecr_repository_policy.conflicting_b,
│   on infra.tf line 32, in resource "aws_ecr_repository_policy" "conflicting_b":
│   32: resource "aws_ecr_repository_policy" "conflicting_b" {

Terraform Configuration Files

Start by applying this:

resource "aws_ecr_repository" "example" {
  name = "rrosborough-cursed-test"
  tags = {
    team           = "infra"
    safe-to-delete = "true"
    who-to-blame   = "rrosborough"
  }
}

data "aws_iam_policy_document" "example" {
  statement {
    sid    = "allow-pull"
    effect = "Allow"

    principals {
      type        = "AWS"
      identifiers = ["763104351884"]
    }

    actions = [
      "ecr:BatchGetImage",
      "ecr:GetDownloadUrlForLayer",
    ]
  }
}

resource "aws_ecr_repository_policy" "conflicting_a" {
  repository = aws_ecr_repository.example.name
  policy     = data.aws_iam_policy_document.example.json
}

Now add the following resource and apply:

resource "aws_ecr_repository_policy" "conflicting_b" {
  repository = aws_ecr_repository.example.name

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "LambdaECRImageRetrievalPolicy"
        Effect = "Allow"
        Principal = {
          Service = "lambda.amazonaws.com"
        }
        Action = [
          "ecr:BatchGetImage",
          "ecr:GetDownloadUrlForLayer",
          "ecr:BatchGetImage",
          "ecr:DeleteRepositoryPolicy",
          "ecr:GetDownloadUrlForLayer",
          "ecr:GetRepositoryPolicy",
          "ecr:SetRepositoryPolicy"
        ]
        Condition = {
          StringLike = {
            "aws:sourceArn" = "arn:aws:lambda:us-east-1:763104351884:function:*"
          }
        }
      }
    ]
  })
}

Finally edit the added resource like so and try applying:

resource "aws_ecr_repository_policy" "conflicting_b" {
  repository = aws_ecr_repository.example.name

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = concat(
      data.aws_iam_policy_document.example.statement,
      [
        {
          Sid    = "LambdaECRImageRetrievalPolicy"
          Effect = "Allow"
          Principal = {
            Service = "lambda.amazonaws.com"
          }
          Action = [
            "ecr:BatchGetImage",
            "ecr:DeleteRepositoryPolicy",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetRepositoryPolicy",
            "ecr:SetRepositoryPolicy"
          ]
          Condition = {
            StringLike = {
              "aws:sourceArn" = "arn:aws:lambda:us-east-1:763104351884:function:*"
            }
          }
        }
      ]
    )
  })
}

Steps to Reproduce

Do the steps above. Applying the first code block should succeed. Applying the second code block should also succeed although now there will be a permanent diff because there are two different TF references to the same repository policy, so the one that applied from the second code block will have overwritten the one from the first code block. Applying the third code block hangs for 2 minutes and then aborts with:

│ Error: putting ECR Repository Policy (rrosborough-cursed-test): operation error ECR: SetRepositoryPolicy, https response error StatusCode: 400, RequestID: e0d3042f-0892-4e1e-924b-802de641f01c, InvalidParameterException: Invalid parameter at 'PolicyText' failed to satisfy constraint: 'Invalid repository policy provided'
│
│   with aws_ecr_repository_policy.conflicting_b,
│   on infra.tf line 32, in resource "aws_ecr_repository_policy" "conflicting_b":
│   32: resource "aws_ecr_repository_policy" "conflicting_b" {

That is expected because the new repository policy is invalid (it's constructed in a way that gives an invalid value for the Condition key). But, what's not expected is that now ALL future Terraform commands fail with this:

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: parsing policy 1: parsing statement 1: 1 error(s) decoding:
│
│ * '[0].Condition' expected a map, got 'slice'
│
│   with aws_ecr_repository_policy.conflicting_b,
│   on infra.tf line 32, in resource "aws_ecr_repository_policy" "conflicting_b":
│   32: resource "aws_ecr_repository_policy" "conflicting_b" {

You can remove the broken resources from Terraform source code but it does not help, since somehow Terraform has put an object into the Terraform state that was not applied to AWS and does not pass the provider's own validation checks. The only way to get out of this state is terraform state rm.

(It might look like removing the resource on line 32 would resolve the error, since the error message points to it explicitly... humorously enough, removing the resource just make Terraform print the same error except now with no line of code or resource pointer.)

Debug Output

No response

Panic Output

No response

Important Factoids

The code in this issue is obviously incorrect and should be rejected, everyone on our end is on the same page about that, the issue though is that the provider gets into a permanently broken state where administrative intervention is needed, instead of just failing the apply like it should (or producing an incorrect result).

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 2 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

petrzjunior commented 1 month ago

I think that this is a broader IAM issue. I encountered the same error after wrongly using the jsonencode function.

After applying this

resource "aws_kms_key" "default" {
  policy = jsonencode(data.aws_iam_policy_document.kms)
}

AWS returns error, but the state gets corrupted. When I then plan the correct syntax

resource "aws_kms_key" "default" {
  policy = data.aws_iam_policy_document.kms.json
}

I get irrecoverable error

encountered: parsing policy 1: parsing statement 1: 1 error(s) decoding:
│ 
│ * '[0].Condition' expected a map, got 'slice'

and the resource must be removed and imported to the state again.

github-actions[bot] commented 3 weeks ago

[!WARNING] This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

github-actions[bot] commented 3 weeks ago

This functionality has been released in v5.73.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!