hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.77k stars 9.12k forks source link

`aws_sqs_queue` resource times out when creating an SQS queue with a built-in policy #24046

Open pierskarsenbarg opened 2 years ago

pierskarsenbarg commented 2 years ago

Community Note

Terraform CLI and Terraform AWS Provider Version

Terraform v1.1.7
on darwin_amd64
+ provider registry.terraform.io/hashicorp/aws v3.46.0

Affected Resource(s)

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.46.0" 
    }
  }
}

# Configure the AWS Provider
provider "aws" {
  region = "eu-west-1"
}

resource "aws_sns_topic" "mytopic" {}

resource "aws_sqs_queue" "dlq" {}

resource "aws_sqs_queue" "myqueue" {
  name                              = "myqueue"
  kms_data_key_reuse_period_seconds = 300
  max_message_size                  = 10240
  message_retention_seconds         = 604800
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dlq.arn
    maxReceiveCount     = 4
  })
  visibility_timeout_seconds = 30
  policy                     = <<POLICY
{
            "Statement": [{
                "Action": ["sqs:SendMessage"],
                "Condition": {
                    "ArnEquals": {
                        "aws:SourceArn": "${aws_sns_topic.mytopic.arn}"
                    }
                },
                "Effect": "Allow",
                "Principal": {
                    "Service": "sns.amazonaws.com"
                },
                "Resource": "*"
            }]

        }
  POLICY
}

Debug Output

v3.46.0: https://gist.github.com/pierskarsenbarg/15fa1ff2a14203c74a725dcbee16b287 v4.8.0: https://gist.github.com/pierskarsenbarg/13571e63473eec48960e2360562bfefc

Panic Output

n/a

Expected Behavior

The queue is created with the appropriate policy attached.

Actual Behavior

The following error message is returned:

β•·
β”‚ Error: error waiting for SQS Queue (https://sqs.eu-west-1.amazonaws.com/xxx/myqueue) attributes to create: SQS Queue policies are not equivalent
β”‚ 
β”‚   with aws_sqs_queue.myqueue,
β”‚   on main.tf line 19, in resource "aws_sqs_queue" "myqueue":
β”‚   19: resource "aws_sqs_queue" "myqueue" {
β”‚ 
β•΅

However, the queue has been created with the correct policy.

If I update the version of the provider to the latest (v4.8.0) then I get a better error message:

β•·
β”‚ Error: error waiting for SQS Queue (https://sqs.eu-west-1.amazonaws.com/052848974346/myqueue) attributes to create: timeout while waiting for state to become 'equal' (last state: 'notequal', timeout: 2m0s)
β”‚ 
β”‚   with aws_sqs_queue.myqueue,
β”‚   on main.tf line 18, in resource "aws_sqs_queue" "myqueue":
β”‚   18: resource "aws_sqs_queue" "myqueue" {
β”‚ 
β•΅

I've included the logs from this version to the DEBUG section above.

Steps to Reproduce

  1. terraform apply (with the above configuration)

Important Factoids

I've also tried using the aws_sqs_queue_policy resource instead, but I get the same error message.

It seems this started in v3.46.0 of the provider. Versions before this work without the error message. All versions since (including the latest version) have this error.

References

justinretzolk commented 2 years ago

Hey @pierskarsenbarg πŸ‘‹ Thank you for taking the time to raise this! On a brief glance over the debug logs, I suspect this might be a whitespace issue, so I've marked this as a bug so that the team can take a look at this as soon as time allows. In the meantime, I'm curious as to whether a workaround might be to switch the policy value over to using jsonencode, similar to how you're doing for redrive_policy.

In preparing to suggest this, I needed to validate my formatting, and so have a copy of what that would look like in case you'd like to try it:

resource "aws_sqs_queue" "myqueue" {
  name                              = "myqueue"
  kms_data_key_reuse_period_seconds = 300
  max_message_size                  = 10240
  message_retention_seconds         = 604800
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dlq.arn
    maxReceiveCount     = 4
  })
  visibility_timeout_seconds = 30
  policy                     = jsonencode({
    "Statement" = [{
      "Action" = ["sqs:SendMessage"]
      "Condition" = {
        "ArnEquals" = {
          "aws:SourceArn" = "${aws_sns_topic.mytopic.arn}"
        }
      }
      "Effect" = "Allow"
      "Principal" = {
        "Service" = "sns.amazonaws.com"
      }
      "Resource" = "*"
    }]
  })
}
pierskarsenbarg commented 2 years ago

Hi @justinretzolk

Thanks for this. I've copied and pasted your resource into my config and re-ran it but got the same error. I've uploaded a new set of logs:

https://gist.github.com/pierskarsenbarg/738fcd816a1013b2000d6faedfd18231

justinretzolk commented 2 years ago

Hey @pierskarsenbarg πŸ‘‹ Thanks for giving that a shot, and I'm sorry to hear that workaround didn't quite fix it. I'll leave this open for someone on the team to take a look when possible. In the meantime, unfortunately it looks like the most recent debug logs you provided got cut off. Can you either update the existing gist or create a fresh one with the full logs?

pierskarsenbarg commented 2 years ago

@justinretzolk looks like gist truncates logs and provides a link to expand them (TIL)

Try this one: https://gist.githubusercontent.com/pierskarsenbarg/738fcd816a1013b2000d6faedfd18231/raw/c942d19fd39b4e0ac531e883ebf2f515a18ed2c9/tflogs.log

atomicmattie commented 2 years ago

@justinretzolk @pierskarsenbarg I'm having this same issue, so I took a look at the logs. I did identify one differenceβ€”Action is ["sqs:SendMessage"] in the config, and "sqs:SendMessage" (JSON) in the log response.

I'm curious if removing the brackets from the config does anything.

However, I should add that our configs already omit the brackets:

resource "aws_sqs_queue_policy" "queue_policy" {
  queue_url = aws_sqs_queue.queue.url
  policy = jsonencode({
    "Statement" : [{
      "Effect" : "Allow",
      "Principal" : {
        "Service" : "sns.amazonaws.com"
      },
      "Action" : "sqs:SendMessage",
      "Resource" : "${aws_sqs_queue.queue.arn}",
      "Condition" : {
        "ArnEquals" : {
          "aws:SourceArn" : [for subscription in module.topic_subscription : subscription.topic_arn]
        }
      }
    }]
  })
}

and as far as I can tell, the response matches (which I extracted from the XML response of our log at the two minute mark):

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sns.amazonaws.com"
      },
      "Action": "sqs:SendMessage",
      "Resource": "REDACTED",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": [
            "REDACTED",
            "REDACTED"
          ]
        }
      }
    }
  ]
}
tahiris719 commented 2 years ago

@justinretzolk @pierskarsenbarg @mattieb

I managed to fix this issue on my end by luck while refactoring. Here's what I found.

When setting up the policy attribute for aws_sqs_queue_policy, if you don't specify Version then this resource will be created, by default, with "Version": "2008-10-17" (supposedly, according to the response).

I say supposedly because, if you type the version as 2008-10-17 it should succeed. If left undefined, it'll fail.

Might have to do with the use of variables but idk i.e. ${variable-name} - AWS Docs.

Solution We've been using the more recent version when creating policies for other resources - "Version" : "2012-10-17". So, the key here is just adding that 1 line. Something like...

resource "aws_sqs_queue_policy" "queue_policy" {
  queue_url = aws_sqs_queue.queue.url

  policy = jsonencode({
    "Version" : "2012-10-17",
    ...,
  })
}

Hope this helps. Cheers.

pratikmm commented 2 years ago

I am getting same issue while adding redrive_policy to existing queue. Even if I am getting error on Terraform apply, I can see redrive_policy is getting added to the queue from AWS console. Still seeking for clean positive output. Above mentioned workaround won't work for me as I am using policy data from different resrouces(which already has Version:"2012-10-17").

liiam6342 commented 2 years ago

I also had the same issue and the mentioned workaround didn't work. I managed to get a workaround of adding the policy in the aws_sqs_queue resource opposed to using its own resource aws_sqs_queue_policy .

hoffa commented 2 years ago

FWIW, here's another small example:

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

resource "aws_sqs_queue" "my_queue" {}

resource "aws_sqs_queue_policy" "my_queue_policy" {
  queue_url = aws_sqs_queue.my_queue.id
  policy    = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": "sqs:SendMessage",
      "Resource": ["*"]
    }
  ]
}
EOF
}

Running:

terraform init
terraform apply -auto-approve

Fails with:

β•·
β”‚ Error: error waiting for SQS Queue Policy (https://sqs.us-west-2.amazonaws.com/792766875239/terraform-20220825001512031300000002) to be set: timeout while waiting for state to become 'equal' (last state: 'notequal', timeout: 2m0s)
β”‚ 
β”‚   with aws_sqs_queue_policy.test3,
β”‚   on main.tf line 11, in resource "aws_sqs_queue_policy" "test3":
β”‚   11: resource "aws_sqs_queue_policy" "test3" {
β”‚ 
β•΅

However, changing "Resource": ["*"] to "Resource": "*" succeeds.

pierskarsenbarg commented 2 years ago

@justinretzolk Looks like others are also having this issue. Any news on an update?

EdNutting commented 2 years ago

Just run into the same timeout issue using aws_sqs_queue_policy but I can't tell what's going wrong.

resource "aws_sqs_queue_policy" "webhooks_queue_policy" {
  queue_url = aws_sqs_queue.webhooks.id

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sns.amazonaws.com"
      },
      "Action": "sqs:SendMessage",
      "Resource": "${aws_sqs_queue.webhooks.arn}",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "${data.aws_ssm_parameter.webhooks_sns_topic_arn.value}"
        }
      }
    }
  ]
}
EOF
}
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Destroying... [id=https://sqs.eu-west-2.amazonaws.com/[REDACTED number]/dev-terraform-cor-k0x4f9-workos-webhooks]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still destroying... [id=https://sqs.eu-west-2.amazonaws.com/[REDACTED number]...v-terraform-cor-k0x4f9-workos-webhooks, 10s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still destroying... [id=https://sqs.eu-west-2.amazonaws.com/[REDACTED number]...v-terraform-cor-k0x4f9-workos-webhooks, 20s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Destruction complete after 26s
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Creating...
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [10s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [20s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [30s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [40s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [50s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m0s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m10s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m20s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m30s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m40s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m50s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [2m0s elapsed]
β•·
β”‚ Error: error waiting for SQS Queue Policy (https://sqs.eu-west-2.amazonaws.com/[REDACTED number]/dev-terraform-cor-k0x4f9-workos-webhooks) to be set: timeout while waiting for state to become 'equal' (last state: 'notequal', timeout: 2m0s)
β”‚ 
β”‚   with module.workos.aws_sqs_queue_policy.webhooks_queue_policy,
β”‚   on modules/workos/webhook.tf line 31, in resource "aws_sqs_queue_policy" "webhooks_queue_policy":
β”‚   31: resource "aws_sqs_queue_policy" "webhooks_queue_policy" {
β”‚ 
β•΅
EdNutting commented 2 years ago

As @tahiris719 mentioned above, setting the version number fixed this for me. Without the version number, the default 2008-... returned by AWS must be causing a mismatch (as it's not set in the policy, so isn't a strict policy match).

Additionally, I tested with Action being an array of 1 item, and found that also caused a timeout error. Presumably because AWS normalises the policy to Action being a string rather than an array of 1, so the returned policy from AWS isn't a strict match for the policy in Terraform.

Seems like there's a very deep comparison bug here? Are other parts of the terraform package affected I wonder?

maticortesr commented 2 years ago

Also facing this issue, the year trick didn't work on my side. Tried both built-in policy and with a policy file. Using Terraform 1.1.6 through Jenkins

ztalarick commented 1 year ago

I saw this issue trying to create a dead letter queue on terraform v0.13.7. I fixed it by changing the redrive_policy from using json encode to being a string.

redrive_policy = jsonencode({ deadLetterTargetArn = aws_sqs_queue.sqs_dead_queue[0].arn maxReceiveCount = var.max_receive_count }) to: redrive_policy = "{\"deadLetterTargetArn\": \"${aws_sqs_queue.sqs_dead_queue[0].arn}\", \"maxReceiveCount\": ${var.max_receive_count }}"

miguelalb commented 1 year ago

I faced this issue too with terraform v1.3.6. I was also able to fix it by chaging the redrive_policy from using json encode to being a string.

bruno-yamada commented 1 year ago

fixed it by moving the policy from argument to its own resource: sqs_queue_redrive_policy

using terraform v1.3.2 and aws provider v4.52.0

eg.

resource "aws_sqs_queue_redrive_policy" "queue_name" {
  queue_url = aws_sqs_queue.queue_name.id
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dead_letter_queue.arn
    maxReceiveCount     = 3
  })
}
jurajseffer commented 1 year ago

Check your type for maxReceiveCount. In my case it wasn't working when I defined the variable used for its value as string, instead of number.

sliaptsou commented 1 year ago

Check your type for maxReceiveCount. In my case it wasn't working when I defined the variable used for its value as string, instead of number.

It works! Thanks a lot!

logicbomb commented 9 months ago

In my case I was getting this error when the setting kms_master_key_id = null and kms_data_key_reuse_period_seconds was set to anything other than the default value (300)

mancej commented 8 months ago

This bug is a nightmare. The policies are created successfully, but the apply fails and it taints the resource every time so every subsequent apply must recreate the resource over and over.

No version of jsonencode, EOF<<, data.aws_iam_policy_document.doc.json works. They ALL have this same issue and it's like shooting in the dark.

mancej commented 8 months ago

Ok I am going to save someone, or many people, many, many hours of time. Additionally, if this is considered, a fix could be put in place.

In my particular case the issue I was referencing a principal in my aws_iam_policy_document like this:

arn:aws:iam::account-id:assumed-role/my-role/my-session-name

^^ This is technically invalid because the arn should begin with a prefix of arn:aws:sts for STS role sessions.

However, the SQS API transparently mutates this ARN into the correct ARN and applies the policy, so the applied ARN on the queue policy is this:

arn:aws:sts::account-id:assumed-role/my-role/my-session-name

Presumably, Terraform reads the queue state back after the policy is applied and discovers that its configuration state is not the same as the configuration state it supplied, and as a result, taints the resource and for some reason times out.

If you perform a complete diff of the applied state, and your TF rendered configuration state, and remediate any differences in your terraform configurations, this diff mismatch goes away.

If I had to theorize, this is the same for all (or most) prior issues stated above ^^. For instance, in the maxReceiveCount issue, I suspect the SQS API is coercing the string into an integer, and Terraform fails to reconcile as a result.

S2o-iuan commented 6 months ago

In my case I was getting this error when the setting is:

resource "aws_sqs_queue_redrive_policy" "queue_name" {
  queue_url = aws_sqs_queue.queue_name.id
  redrive_policy = jsonencode({
    deadLetterTargetArn = "${var.dead_letter_queue_arn}"
    maxReceiveCount     = "${var. maxReceiveCount}"
  })
}

module "sqs" {
  source = "xxx/xxxx/"
  dead_letter_queue_arn = "dead_letter_queue_arn"
  maxReceiveCount       = "5"
}

then I changed the type of maxReceiveCount from string to number, and it worked.

resource "aws_sqs_queue_redrive_policy" "queue_name" {
  queue_url = aws_sqs_queue.queue_name.id
  redrive_policy = jsonencode({
    deadLetterTargetArn = "${var.dead_letter_queue_arn}"
    maxReceiveCount     = "${var. maxReceiveCount}"
  })
}

module "sqs" {
  source = "xxx/xxxx/"
  dead_letter_queue_arn = "dead_letter_queue_arn"
  maxReceiveCount       = 5
}

here is a reference, but it doesn't fit some situation... https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue

redrive_policy - (Optional) The JSON policy to set up the Dead Letter Queue, see AWS docs. Note: when specifying maxReceiveCount, you must specify it as an integer (5), and not a string ("5").

5t33 commented 5 months ago

Still seeing this:

module "sqs_crawler_queue" {
  source  = "terraform-aws-modules/sqs/aws"

  name = "${var.org_name}-glue-crawler-queue-${var.environment}"

#   redrive_policy = {
#     maxReceiveCount = 10
#   }

  queue_policy_statements = {
    Version = "2012-10-17"
    glue = {
      sid     = "BackendPush"
      actions = [
        "sqs:Get*",
        "sqs:List*",
        "sqs:Describe*"
      ]

      principals = [
        {
          type        = "Service"
          identifiers = [
           "glue.amazonaws.com"
          ]
        }
      ]
    },
    s3 = {
      sid     = "S3Publish"
      actions = ["sqs:SendMessage"]

      principals = [
        {
            type        = "Service"
            identifiers = ["s3.amazonaws.com"]
        }
      ]
    }

  }

  tags = {
    Environment = var.environment
  }
}
5t33 commented 5 months ago

FWIW, I was able to get a policy generated using this policy:

resource "aws_sqs_queue_policy" "this" {
  queue_url = module.sqs_crawler_queue.queue_url
  policy    = jsonencode({
    "Version" : "2012-10-17",
    "Statement": [

      {
        "Sid": "Root",
        "Effect": "Allow",
        "Principal": {
          "AWS": "arn:aws:iam::${local.account_id}:root"
        },
        "Action": "sqs:*",
        "Resource": [module.sqs_crawler_queue.queue_arn]
      },
      {
        "Sid": "Glue",
        "Effect" : "Allow",
        "Principal" : {
          "Service" : "glue.amazonaws.com"
        },
        "Resource": [module.sqs_crawler_queue.queue_arn],
        "Action" : [
          "sqs:Get*",
          "sqs:List*",
          "sqs:Describe*"
        ]
      },
      {
        "Sid": "S3",
        "Effect" : "Allow",
        "Principal" : {
          "Service" : "s3.amazonaws.com"
        },
        "Action" : [
          "sqs:SendMessage"
        ],
        "Resource": [module.sqs_crawler_queue.queue_arn]
      }
    ]
  })
}

Not sure what the issue is, but that policy was generated and attached to the sqs just fine.

CCorrado commented 4 months ago

In my case, a company policy was appending a default x-account deny to the IAM policy I had defined in Terraform.

I was able to get around this error by copying what was generated in the console and appending it to the policy I defined in TF.

jsonencode({
"Version" : "2012-10-17",
    "Statement" : [
      {
        "Sid" : "SendToQueue",
        "Effect" : "Allow",
        "Principal" : { "Service" : "sns.amazonaws.com" },
        "Action" : "sqs:SendMessage",
        "Resource" : "arn:aws:sqs:${var.region}:${var.aws_account_id}:${var.environment}_queue",
        "Condition" : {
          "ArnLike" : {
            "aws:SourceArn" : aws_sns_topic.topic.arn
          }
        }
      },
     {
        "Sid" : "DenyCrossAccountAccess",
        "Effect" : "Deny",
        "Principal" : {
          "AWS" : "*"
        },
        "Action" : "sqs:*",
        "Resource" : "*",
        "Condition" : {
          "StringNotLike" : {
            "aws:PrincipalArn" : [
              "arn:aws:iam::33*******:*",
              "arn:aws:sts::33*******:*"
            ],
            "aws:PrincipalServiceName" : "*.amazonaws.com"
          }
        }
      }
})
chakatz commented 2 months ago

I got the same error when I set both of these:

sqs_managed_sse_enabled           = true
kms_data_key_reuse_period_seconds = 300

Of course, combining SSE and KMS is not supported, is not allowed in the AWS console or the CLI, and should not be allowed in Terraform. The Plan should return an error. The plan should also return an error if kms_data_key_reuse_period_seconds is set without kms_master_key_id.