hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.88k stars 9.22k forks source link

[Bug]: Cannot deploy (apply) `aws_api_gateway_rest_api` with `aws_api_gateway_rest_api_policy` at the same time #29392

Open mklosittam opened 1 year ago

mklosittam commented 1 year ago

Terraform Core Version

1.3.8

AWS Provider Version

4.54.0

Affected Resource(s)

Expected Behavior

I should be able to deploy (or apply) all of this successfully.

Actual Behavior

I get an error. Curiously, after apply fails the first time, if i run plan and apply again, it succeeds the second time. Also, if I apply the 3 resources one by one, it also succeeds. This makes me think that the resource dependency tree is not being resolved correctly, but I don't see what the problem is in the code.

Relevant Error/Panic Output Snippet

aws_api_gateway_rest_api_policy.api_policy: Creating...
╷
│ Error: setting API Gateway REST API Policy BadRequestException: Invalid policy document. Please check the policy syntax and ensure that Principals are valid.
│ 
│   with aws_api_gateway_rest_api_policy.api_policy,
│   on main.tf line 60, in resource "aws_api_gateway_rest_api_policy" "api_policy":
│   60: resource "aws_api_gateway_rest_api_policy" "api_policy" {

Terraform Configuration Files

terraform {
  required_version = ">= 1.2"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

data "aws_caller_identity" "current" {}

data "aws_iam_policy_document" "api_access" {
  statement {
    effect = "Allow"
    actions = [
      "sts:AssumeRole"
    ]
    principals {
      type        = "AWS"
      identifiers = [
        "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
      ]
    }
  }
}

data "aws_iam_policy_document" "api_invocation_policy" {
  statement {
    effect = "Allow"
    actions = [
      "execute-api:Invoke",
    ]
    resources = [
      "arn:aws:execute-api:*:${data.aws_caller_identity.current.account_id}:${aws_api_gateway_rest_api.api_gateway.id}/*/*/*"
    ]
  }
}

resource "aws_iam_role" "api_gateway_access_role" {
  name               = "api-gateway-access-role"
  path               = "/"
  assume_role_policy = data.aws_iam_policy_document.api_access.json
  inline_policy {
    name   = "api-invocation-policy"
    policy = data.aws_iam_policy_document.api_invocation_policy.json
  }
}

# --- API Gateway Resources --- #
resource "aws_api_gateway_rest_api" "api_gateway" {
  name        = "api-gateway"
  description = "Proxy to handle requests"
  endpoint_configuration {
    types = ["EDGE"]
  }
}

resource "aws_api_gateway_rest_api_policy" "api_policy" {
  rest_api_id = aws_api_gateway_rest_api.api_gateway.id
  policy      = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "${aws_iam_role.api_gateway_access_role.arn}"
      },
      "Action": "execute-api:Invoke",
      "Resource": "execute-api:/*"
    }
  ]
}
EOF
}

# ${aws_iam_role.api_gateway_access_role.arn}

output "aws_api_gateway_rest_api-api_gateway-id" {
  value = aws_api_gateway_rest_api.api_gateway.id
}

output "aws_api_gateway_rest_api_api_gateway_arn" {
  value = aws_api_gateway_rest_api.api_gateway.arn
}

output "aws_iam_role_api_gateway_access_role_arn" {
  value = aws_iam_role.api_gateway_access_role.arn
}

Steps to Reproduce

NOTE: backend.tf and provider.tf files have been omitted, since they are very stub. Everything else in my deployment works, except for these 3 resources along with their data.

Debug Output

$ terraform apply .terraform.tfplan
aws_api_gateway_rest_api.api_gateway: Creating...
aws_api_gateway_rest_api.api_gateway: Creation complete after 1s [id=j0pcpiqboh]
data.aws_iam_policy_document.api_invocation_policy: Reading...
data.aws_iam_policy_document.api_invocation_policy: Read complete after 0s [id=3227615025]
aws_iam_role.api_gateway_access_role: Creating...
aws_iam_role.api_gateway_access_role: Creation complete after 1s [id=api-gateway-access-role]
aws_api_gateway_rest_api_policy.api_policy: Creating...
╷
│ Error: setting API Gateway REST API Policy BadRequestException: Invalid policy document. Please check the policy syntax and ensure that Principals are valid.
│ 
│   with aws_api_gateway_rest_api_policy.api_policy,
│   on main.tf line 60, in resource "aws_api_gateway_rest_api_policy" "api_policy":
│   60: resource "aws_api_gateway_rest_api_policy" "api_policy" {
│ 
╵
Releasing state lock. This may take a few moments...
ERRO[0007] 1 error occurred:
    * exit status 1

Panic Output

No response

Important Factoids

Curiously, after apply fails the first time, if i run plan and apply again, it succeeds the second time. Also, if I apply the 3 resources one by one, it also succeeds. This makes me think that the resource dependency tree is not being resolved correctly, but I don't see what the problem is in the code.

I searched online, but the only thing that seems relevant, isn't really helping me: https://stackoverflow.com/questions/54780301/invalid-policy-document-please-check-the-policy-syntax-and-ensure-that-principa

References

No response

Would you like to implement a fix?

No

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 year ago

Hey @mklosittam 👋 Thank you for taking the time to raise this! So that we have the necessary information in order to look into this, can you supply debug logs (redacted as needed) as well?

subodh-kumar-02 commented 1 year ago

Has anyone found any solution for the above error I am also getting the same error. when a wildcard(*) is used for the principal then it successfully creates the policy, but couldn't able to fetch the value(arn or name) of the IAM role/user

jflopezcolmenarejo commented 10 months ago

Same here, I am facing the same error. Any ideas about how to work this around?

subodh-kumar-02 commented 10 months ago

Same here, I am facing the same error. Any ideas about how to work this around?

Need to run the terraform apply 2 times.

  1. Run with policy having wildcard(*)
  2. Then using the required arn.

Hope this helps.

jflopezcolmenarejo commented 10 months ago

Same here, I am facing the same error. Any ideas about how to work this around?

Need to run the terraform apply 2 times.

  1. Run with policy having wildcard(*)
  2. Then using the required arn.

Hope this helps.

Thanks, well, I will have to adapt my bitbucket pipeline and add an extra step... Will be ugly but it is what it is.

Enewman00 commented 2 months ago

Another workaround for this is to use external provider and data source with the AWS CLI to get the role after creation.

data "external" "iam_role_retrieval" {
  program = ["bash", "${path.module}/iam_role.sh", var.profile_primary, var.iam_role.name]

  depends_on = [var.iam_role]
}

resource "aws_api_gateway_rest_api_policy" "policy" {
  ...
  depends_on  = [data.external.iam_role_retrieval]

  policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : [
      {
        Principal : data.external.iam_role_retrieval.result["arn"],
        ...

and the bash script would just do something like use the AWS CLI to attempt to get the role

#!/bin/bash

# Assign arguments to variables for better readability
PROFILE=$1
ROLE_NAME=$2

# Loop up to 3 times
for i in {1..3}; do
  sleep 5
  # Fetch the IAM role
  result=$(aws iam get-role --profile "$PROFILE" --role-name "$ROLE_NAME" --output json)

  # Check if role was found
  if [[ $? -eq 0 ]]; then
    # Extract the ARN from the JSON result
    arn=$(echo "$result" | jq -r '.Role.Arn')
    echo '{ "success": "true", "arn": "'"$arn"'" }'
    exit 0
  fi
done

# Output an empty JSON object if the desired state is not found
echo '{ "success": "false", "arn": null }'
colby-addepar commented 2 months ago

Discovered this is actually a racing condition from when the aws_iam_role is created and then is attached by the aws_api_gateway_rest_api_policy. depends_on also doesn't seem to work or wait long enough. I was able to workaround this by using the terraform time_sleep resource. No need for local_exec or other hacks.

Example:

# aws_iam_role can take a few seconds to propgate it's ARN before being consumed
# by other resources. This resource will wait for the role to be available before
# creating the API GW resource policy

resource "time_sleep" "iam_role_propagation" {
  create_duration = "30s"

  triggers = {
    iam_role_arn = aws_iam_role.gw_access.arn
  }
}

resource "aws_api_gateway_rest_api_policy" "this" {
  rest_api_id = aws_api_gateway_rest_api.this.id
  policy      = data.aws_iam_policy_document.resource_policy.json
}

data "aws_iam_policy_document" "resource_policy" {
  statement {
    sid       = "Resource"
    effect    = "Allow"
    resources = ["${aws_api_gateway_rest_api.this.execution_arn}/*/*"]
    principals {
      type        = "AWS"
      identifiers = [time_sleep.iam_role_propagation.triggers["iam_role_arn"]]
    }
    actions = [
      "execute-api:Invoke"
    ]
  }
}