Open pierskarsenbarg opened 2 years ago
Hey @pierskarsenbarg π Thank you for taking the time to raise this! On a brief glance over the debug logs, I suspect this might be a whitespace issue, so I've marked this as a bug so that the team can take a look at this as soon as time allows. In the meantime, I'm curious as to whether a workaround might be to switch the policy
value over to using jsonencode
, similar to how you're doing for redrive_policy
.
In preparing to suggest this, I needed to validate my formatting, and so have a copy of what that would look like in case you'd like to try it:
resource "aws_sqs_queue" "myqueue" {
name = "myqueue"
kms_data_key_reuse_period_seconds = 300
max_message_size = 10240
message_retention_seconds = 604800
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.dlq.arn
maxReceiveCount = 4
})
visibility_timeout_seconds = 30
policy = jsonencode({
"Statement" = [{
"Action" = ["sqs:SendMessage"]
"Condition" = {
"ArnEquals" = {
"aws:SourceArn" = "${aws_sns_topic.mytopic.arn}"
}
}
"Effect" = "Allow"
"Principal" = {
"Service" = "sns.amazonaws.com"
}
"Resource" = "*"
}]
})
}
Hi @justinretzolk
Thanks for this. I've copied and pasted your resource into my config and re-ran it but got the same error. I've uploaded a new set of logs:
https://gist.github.com/pierskarsenbarg/738fcd816a1013b2000d6faedfd18231
Hey @pierskarsenbarg π Thanks for giving that a shot, and I'm sorry to hear that workaround didn't quite fix it. I'll leave this open for someone on the team to take a look when possible. In the meantime, unfortunately it looks like the most recent debug logs you provided got cut off. Can you either update the existing gist or create a fresh one with the full logs?
@justinretzolk looks like gist truncates logs and provides a link to expand them (TIL)
@justinretzolk @pierskarsenbarg I'm having this same issue, so I took a look at the logs. I did identify one differenceβAction
is ["sqs:SendMessage"]
in the config, and "sqs:SendMessage"
(JSON) in the log response.
I'm curious if removing the brackets from the config does anything.
However, I should add that our configs already omit the brackets:
resource "aws_sqs_queue_policy" "queue_policy" {
queue_url = aws_sqs_queue.queue.url
policy = jsonencode({
"Statement" : [{
"Effect" : "Allow",
"Principal" : {
"Service" : "sns.amazonaws.com"
},
"Action" : "sqs:SendMessage",
"Resource" : "${aws_sqs_queue.queue.arn}",
"Condition" : {
"ArnEquals" : {
"aws:SourceArn" : [for subscription in module.topic_subscription : subscription.topic_arn]
}
}
}]
})
}
and as far as I can tell, the response matches (which I extracted from the XML response of our log at the two minute mark):
{
"Version": "2008-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "sns.amazonaws.com"
},
"Action": "sqs:SendMessage",
"Resource": "REDACTED",
"Condition": {
"ArnEquals": {
"aws:SourceArn": [
"REDACTED",
"REDACTED"
]
}
}
}
]
}
@justinretzolk @pierskarsenbarg @mattieb
I managed to fix this issue on my end by luck while refactoring. Here's what I found.
When setting up the policy
attribute for aws_sqs_queue_policy
, if you don't specify Version
then this resource will be created, by default, with "Version": "2008-10-17"
(supposedly, according to the response).
I say supposedly because, if you type the version as 2008-10-17
it should succeed. If left undefined, it'll fail.
Might have to do with the use of variables but idk i.e. ${variable-name}
- AWS Docs.
Solution
We've been using the more recent version when creating policies for other resources - "Version" : "2012-10-17"
. So, the key here is just adding that 1 line. Something like...
resource "aws_sqs_queue_policy" "queue_policy" {
queue_url = aws_sqs_queue.queue.url
policy = jsonencode({
"Version" : "2012-10-17",
...,
})
}
Hope this helps. Cheers.
I am getting same issue while adding redrive_policy to existing queue. Even if I am getting error on Terraform apply, I can see redrive_policy is getting added to the queue from AWS console. Still seeking for clean positive output. Above mentioned workaround won't work for me as I am using policy data from different resrouces(which already has Version:"2012-10-17").
I also had the same issue and the mentioned workaround didn't work. I managed to get a workaround of adding the policy in the aws_sqs_queue
resource opposed to using its own resource aws_sqs_queue_policy
.
FWIW, here's another small example:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}
resource "aws_sqs_queue" "my_queue" {}
resource "aws_sqs_queue_policy" "my_queue_policy" {
queue_url = aws_sqs_queue.my_queue.id
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "sqs:SendMessage",
"Resource": ["*"]
}
]
}
EOF
}
Running:
terraform init
terraform apply -auto-approve
Fails with:
β·
β Error: error waiting for SQS Queue Policy (https://sqs.us-west-2.amazonaws.com/792766875239/terraform-20220825001512031300000002) to be set: timeout while waiting for state to become 'equal' (last state: 'notequal', timeout: 2m0s)
β
β with aws_sqs_queue_policy.test3,
β on main.tf line 11, in resource "aws_sqs_queue_policy" "test3":
β 11: resource "aws_sqs_queue_policy" "test3" {
β
β΅
However, changing "Resource": ["*"]
to "Resource": "*"
succeeds.
@justinretzolk Looks like others are also having this issue. Any news on an update?
Just run into the same timeout issue using aws_sqs_queue_policy
but I can't tell what's going wrong.
resource "aws_sqs_queue_policy" "webhooks_queue_policy" {
queue_url = aws_sqs_queue.webhooks.id
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "sns.amazonaws.com"
},
"Action": "sqs:SendMessage",
"Resource": "${aws_sqs_queue.webhooks.arn}",
"Condition": {
"ArnEquals": {
"aws:SourceArn": "${data.aws_ssm_parameter.webhooks_sns_topic_arn.value}"
}
}
}
]
}
EOF
}
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Destroying... [id=https://sqs.eu-west-2.amazonaws.com/[REDACTED number]/dev-terraform-cor-k0x4f9-workos-webhooks]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still destroying... [id=https://sqs.eu-west-2.amazonaws.com/[REDACTED number]...v-terraform-cor-k0x4f9-workos-webhooks, 10s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still destroying... [id=https://sqs.eu-west-2.amazonaws.com/[REDACTED number]...v-terraform-cor-k0x4f9-workos-webhooks, 20s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Destruction complete after 26s
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Creating...
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [10s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [20s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [30s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [40s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [50s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m0s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m10s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m20s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m30s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m40s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [1m50s elapsed]
module.workos.aws_sqs_queue_policy.webhooks_queue_policy: Still creating... [2m0s elapsed]
β·
β Error: error waiting for SQS Queue Policy (https://sqs.eu-west-2.amazonaws.com/[REDACTED number]/dev-terraform-cor-k0x4f9-workos-webhooks) to be set: timeout while waiting for state to become 'equal' (last state: 'notequal', timeout: 2m0s)
β
β with module.workos.aws_sqs_queue_policy.webhooks_queue_policy,
β on modules/workos/webhook.tf line 31, in resource "aws_sqs_queue_policy" "webhooks_queue_policy":
β 31: resource "aws_sqs_queue_policy" "webhooks_queue_policy" {
β
β΅
As @tahiris719 mentioned above, setting the version number fixed this for me. Without the version number, the default 2008-...
returned by AWS must be causing a mismatch (as it's not set in the policy, so isn't a strict policy match).
Additionally, I tested with Action
being an array of 1 item, and found that also caused a timeout error. Presumably because AWS normalises the policy to Action
being a string rather than an array of 1, so the returned policy from AWS isn't a strict match for the policy in Terraform.
Seems like there's a very deep comparison bug here? Are other parts of the terraform package affected I wonder?
Also facing this issue, the year trick didn't work on my side. Tried both built-in policy and with a policy file. Using Terraform 1.1.6
through Jenkins
I saw this issue trying to create a dead letter queue on terraform v0.13.7
. I fixed it by changing the redrive_policy from using json encode to being a string.
redrive_policy = jsonencode({ deadLetterTargetArn = aws_sqs_queue.sqs_dead_queue[0].arn maxReceiveCount = var.max_receive_count })
to:
redrive_policy = "{\"deadLetterTargetArn\": \"${aws_sqs_queue.sqs_dead_queue[0].arn}\", \"maxReceiveCount\": ${var.max_receive_count }}"
I faced this issue too with terraform v1.3.6
. I was also able to fix it by chaging the redrive_policy
from using json encode to being a string.
fixed it by moving the policy from argument
to its own resource: sqs_queue_redrive_policy
using terraform v1.3.2
and aws provider v4.52.0
eg.
resource "aws_sqs_queue_redrive_policy" "queue_name" {
queue_url = aws_sqs_queue.queue_name.id
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.dead_letter_queue.arn
maxReceiveCount = 3
})
}
Check your type for maxReceiveCount
. In my case it wasn't working when I defined the variable used for its value as string
, instead of number
.
Check your type for
maxReceiveCount
. In my case it wasn't working when I defined the variable used for its value asstring
, instead ofnumber
.
It works! Thanks a lot!
In my case I was getting this error when the setting kms_master_key_id = null
and kms_data_key_reuse_period_seconds
was set to anything other than the default value (300)
This bug is a nightmare. The policies are created successfully, but the apply fails and it taints the resource every time so every subsequent apply must recreate the resource over and over.
No version of jsonencode
, EOF<<
, data.aws_iam_policy_document.doc.json
works. They ALL have this same issue and it's like shooting in the dark.
Ok I am going to save someone, or many people, many, many hours of time. Additionally, if this is considered, a fix could be put in place.
In my particular case the issue I was referencing a principal in my aws_iam_policy_document
like this:
arn:aws:iam::account-id:assumed-role/my-role/my-session-name
^^ This is technically invalid because the arn should begin with a prefix of arn:aws:sts
for STS role sessions.
However, the SQS API transparently mutates this ARN into the correct ARN and applies the policy, so the applied ARN on the queue policy is this:
arn:aws:sts::account-id:assumed-role/my-role/my-session-name
Presumably, Terraform reads the queue state back after the policy is applied and discovers that its configuration state is not the same as the configuration state it supplied, and as a result, taints the resource and for some reason times out.
If you perform a complete diff of the applied state, and your TF rendered configuration state, and remediate any differences in your terraform configurations, this diff mismatch goes away.
If I had to theorize, this is the same for all (or most) prior issues stated above ^^. For instance, in the maxReceiveCount issue, I suspect the SQS API is coercing the string into an integer, and Terraform fails to reconcile as a result.
In my case I was getting this error when the setting is:
resource "aws_sqs_queue_redrive_policy" "queue_name" {
queue_url = aws_sqs_queue.queue_name.id
redrive_policy = jsonencode({
deadLetterTargetArn = "${var.dead_letter_queue_arn}"
maxReceiveCount = "${var. maxReceiveCount}"
})
}
module "sqs" {
source = "xxx/xxxx/"
dead_letter_queue_arn = "dead_letter_queue_arn"
maxReceiveCount = "5"
}
then I changed the type of maxReceiveCount from string
to number
, and it worked.
resource "aws_sqs_queue_redrive_policy" "queue_name" {
queue_url = aws_sqs_queue.queue_name.id
redrive_policy = jsonencode({
deadLetterTargetArn = "${var.dead_letter_queue_arn}"
maxReceiveCount = "${var. maxReceiveCount}"
})
}
module "sqs" {
source = "xxx/xxxx/"
dead_letter_queue_arn = "dead_letter_queue_arn"
maxReceiveCount = 5
}
here is a reference, but it doesn't fit some situation... https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue
redrive_policy - (Optional) The JSON policy to set up the Dead Letter Queue, see AWS docs. Note: when specifying maxReceiveCount, you must specify it as an integer (5), and not a string ("5").
Still seeing this:
module "sqs_crawler_queue" {
source = "terraform-aws-modules/sqs/aws"
name = "${var.org_name}-glue-crawler-queue-${var.environment}"
# redrive_policy = {
# maxReceiveCount = 10
# }
queue_policy_statements = {
Version = "2012-10-17"
glue = {
sid = "BackendPush"
actions = [
"sqs:Get*",
"sqs:List*",
"sqs:Describe*"
]
principals = [
{
type = "Service"
identifiers = [
"glue.amazonaws.com"
]
}
]
},
s3 = {
sid = "S3Publish"
actions = ["sqs:SendMessage"]
principals = [
{
type = "Service"
identifiers = ["s3.amazonaws.com"]
}
]
}
}
tags = {
Environment = var.environment
}
}
FWIW, I was able to get a policy generated using this policy:
resource "aws_sqs_queue_policy" "this" {
queue_url = module.sqs_crawler_queue.queue_url
policy = jsonencode({
"Version" : "2012-10-17",
"Statement": [
{
"Sid": "Root",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::${local.account_id}:root"
},
"Action": "sqs:*",
"Resource": [module.sqs_crawler_queue.queue_arn]
},
{
"Sid": "Glue",
"Effect" : "Allow",
"Principal" : {
"Service" : "glue.amazonaws.com"
},
"Resource": [module.sqs_crawler_queue.queue_arn],
"Action" : [
"sqs:Get*",
"sqs:List*",
"sqs:Describe*"
]
},
{
"Sid": "S3",
"Effect" : "Allow",
"Principal" : {
"Service" : "s3.amazonaws.com"
},
"Action" : [
"sqs:SendMessage"
],
"Resource": [module.sqs_crawler_queue.queue_arn]
}
]
})
}
Not sure what the issue is, but that policy was generated and attached to the sqs just fine.
In my case, a company policy was appending a default x-account deny to the IAM policy I had defined in Terraform.
I was able to get around this error by copying what was generated in the console and appending it to the policy I defined in TF.
jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
"Sid" : "SendToQueue",
"Effect" : "Allow",
"Principal" : { "Service" : "sns.amazonaws.com" },
"Action" : "sqs:SendMessage",
"Resource" : "arn:aws:sqs:${var.region}:${var.aws_account_id}:${var.environment}_queue",
"Condition" : {
"ArnLike" : {
"aws:SourceArn" : aws_sns_topic.topic.arn
}
}
},
{
"Sid" : "DenyCrossAccountAccess",
"Effect" : "Deny",
"Principal" : {
"AWS" : "*"
},
"Action" : "sqs:*",
"Resource" : "*",
"Condition" : {
"StringNotLike" : {
"aws:PrincipalArn" : [
"arn:aws:iam::33*******:*",
"arn:aws:sts::33*******:*"
],
"aws:PrincipalServiceName" : "*.amazonaws.com"
}
}
}
})
I got the same error when I set both of these:
sqs_managed_sse_enabled = true
kms_data_key_reuse_period_seconds = 300
Of course, combining SSE and KMS is not supported, is not allowed in the AWS console or the CLI, and should not be allowed in Terraform. The Plan should return an error.
The plan should also return an error if kms_data_key_reuse_period_seconds
is set without kms_master_key_id
.
This happened where I work on two specific SQS queues consistently. Turned out we were applying the aws_sqs_queue_policy
resource on the same queue twice to set a policy condition for two different sns topics. Removing the 2nd invocation of aws_sqs_queue_policy
from both queues resulted in the correct behaviour.
Results in error:
resource "aws_sqs_queue" "q" {
name = "examplequeue"
}
resource "aws_sqs_queue_policy" "test_1" {
queue_url = aws_sqs_queue.q.id
policy = templatefile("templates/sqs-queue-policy.json.tpl", {
sqs_queue = aws_sqs_queue.q.arn
sns_topic = aws_sns_topic.t1.arn
})
}
resource "aws_sqs_queue_policy" "test_2" {
queue_url = aws_sqs_queue.q.id
policy = templatefile("templates/sqs-queue-policy.json.tpl", {
sqs_queue = aws_sqs_queue.q.arn
sns_topic = aws_sns_topic.t2.arn
})
}
Fix:
resource "aws_sqs_queue" "q" {
name = "examplequeue"
}
resource "aws_sqs_queue_policy" "test_1" {
queue_url = aws_sqs_queue.q.id
policy = templatefile("templates/sqs-queue-policy.json.tpl", {
sqs_queue = aws_sqs_queue.q.arn
sns_topic_1 = aws_sns_topic.t1.arn
sns_topic_2 = aws_sns_topic.t2.arn
})
}
Community Note
Terraform CLI and Terraform AWS Provider Version
Affected Resource(s)
Terraform Configuration Files
Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.
Debug Output
v3.46.0: https://gist.github.com/pierskarsenbarg/15fa1ff2a14203c74a725dcbee16b287 v4.8.0: https://gist.github.com/pierskarsenbarg/13571e63473eec48960e2360562bfefc
Panic Output
n/a
Expected Behavior
The queue is created with the appropriate policy attached.
Actual Behavior
The following error message is returned:
However, the queue has been created with the correct policy.
If I update the version of the provider to the latest (
v4.8.0
) then I get a better error message:I've included the logs from this version to the DEBUG section above.
Steps to Reproduce
terraform apply
(with the above configuration)Important Factoids
I've also tried using the
aws_sqs_queue_policy
resource instead, but I get the same error message.It seems this started in v3.46.0 of the provider. Versions before this work without the error message. All versions since (including the latest version) have this error.
References