hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.77k stars 9.12k forks source link

Regression: SNS topic subscription is not recreated after SNS topic delete/recreate #25059

Closed BGarber42 closed 2 months ago

BGarber42 commented 2 years ago

Community Note

Terraform CLI and Terraform AWS Provider Version

Terraform version: 1.1.9 AWS Provider: 4.13.0

Affected Resource(s)

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp

Debug Output

Panic Output

Expected Behavior

Terraform should recreate the SNS subscription

Actual Behavior

Terraform does not recreate the SNS subscription

Steps to Reproduce

  1. Create SNS Topic
  2. Subscribe an SQS Queue to SNS Topic
  3. Delete SNS Topic
  4. Re-create SNS Topic with same name

Important Factoids

References

gdavison commented 2 years ago

Hi @BGarber42, did this previously work for you without the ? Can you give the provider version number where this worked? From looking at the code in the commit you referenced (i.e., b8052fdb8a02a1e6cb2ae45fde6d694dc8443c2e), there aren't functional changes to what the provider is doing. The function tfresource.RetryWhenNewResourceNotFound replaces both the resource.Retry loop and the if output == nil check.

I'll look into this more, since it is clearly a problem. One problem is that even when an SNS topic is recreated, the ARN is the same, so Terraform doesn't see the change.

gdavison commented 2 years ago

Can you provide some more information, please, @BGarber42? I can't reproduce the problem. Are you both destroying and recreating the SNS Topic outside of Terraform?

When I create the SNS Topic and Subscriptions in Terraform, then use the AWS Console to destroy the Topic, terraform plan shows the following

  # aws_sns_topic.test will be created
  + resource "aws_sns_topic" "test" {
      + arn                         = (known after apply)
      + content_based_deduplication = false
      + fifo_topic                  = false
      + id                          = (known after apply)
      + name                        = "test"
      + name_prefix                 = (known after apply)
      + owner                       = (known after apply)
      + policy                      = (known after apply)
      + tags_all                    = (known after apply)
    }

  # aws_sns_topic_subscription.test must be replaced
-/+ resource "aws_sns_topic_subscription" "test" {
      ~ arn                             = "arn:aws:sns:us-west-2:123456789012:test:f66f76ba-9ef9-490a-ae0f-5ae5dedfe9df" -> (known after apply)
      ~ confirmation_was_authenticated  = true -> (known after apply)
      ~ id                              = "arn:aws:sns:us-west-2: 123456789012:test:f66f76ba-9ef9-490a-ae0f-5ae5dedfe9df" -> (known after apply)
      ~ owner_id                        = "123456789012" -> (known after apply)
      ~ pending_confirmation            = false -> (known after apply)
      ~ topic_arn                       = "arn:aws:sns:us-west-2: 123456789012:test" -> (known after apply) # forces replacement
        # (5 unchanged attributes hidden)
    }

  # aws_sns_topic_subscription.test_email must be replaced
-/+ resource "aws_sns_topic_subscription" "test_email" {
      ~ arn                             = "arn:aws:sns:us-west-2: 123456789012:test:93eddf0f-13a1-4477-9265-35a70a67ad5f" -> (known after apply)
      ~ confirmation_was_authenticated  = false -> (known after apply)
      ~ id                              = "arn:aws:sns:us-west-2: 123456789012:test:93eddf0f-13a1-4477-9265-35a70a67ad5f" -> (known after apply)
      ~ owner_id                        = "123456789012" -> (known after apply)
      ~ pending_confirmation            = true -> (known after apply)
      ~ topic_arn                       = "arn:aws:sns:us-west-2: 123456789012:test" -> (known after apply) # forces replacement
        # (5 unchanged attributes hidden)
    }
gdavison commented 2 years ago

Hi @BGarber42, since I haven't heard back, I'm going to assume that this happens when the SNS topic is either deleted or recreated outside of Terraform, and work on a fix based on that assumption. If this is not the case, please let me know what triggers the error that you're seeing.

tl;dr: As a workaround, use name_prefix instead of name, so that Terraform can detect that the Topic has been deleted or recreated outside of Terraform.

The basic problem here is that the SNS Topic ARN is based exclusively on the SNS Topic name. If a topic is destroyed and recreated with the same name, the ARN will therefore be the same. If Terraform handles the recreation, it will also recreate the subscription. However, if the topic is recreated outside of Terraform, the provider cannot use the ARN value to detect that it has changed.

In most cases, the provider could make other API calls to detect the change. The calls ListSubscriptions and ListSubscriptionsByTopic do not return the actual ARN for a subscription if the subscription is pending, and instead returns PendingConfirmation in the ARN field. The Subscribe API call has the field ReturnSubscriptionArn which will return an actual ARN, but this field does not exist for the List... calls.

As it stands, there is no mechanism in the AWS API to identify orphaned subscriptions. Even in the AWS Console, orphaned subscriptions will still link back to the recreated topic, even though the topic does not list it as a subscription.

BGarber42 commented 2 years ago

Sorry for the delayed response, some reason this ticket notifications got filtered into spam.

It's not being modified outside of terraform, just in a separate workspace.

Workspace A sets up the SNS Topics, then any number of other workspaces may create SQS queues which subscribe to the topic.

If the primary workspace gets redeployed, SQS queues still think they're subscribed when they're not.

BGarber42 commented 2 years ago

To reproduce, see the original ticket #9645 which still works.

Seems I liked the wrong, regression commit. I think it is actually this commit which revered the changes from #14101.

ListSubscriptionsByTopic will give you a current list of active subscriptions, whereas, SubscriptionByARN actually uses GetSubscriptionAttributes includes stale subscriptions.

Which can be shown with the output comparing ListSubscriptions vs ListSubscriptionsByTopic endpoints.

C:\Users\BGarber>aws sns list-subscriptions --region us-west-2 --query "Subscriptions[?TopicArn=='arn:aws:sns:us-west-2:8675309:sns-dev-incoming']" | jq ". | length"
17

C:\Users\BGarber>aws sns list-subscriptions-by-topic --region us-west-2 --topic-arn arn:aws:sns:us-west-2:8675309:sns-dev-incoming | jq ".Subscriptions | length"
5
BGarber42 commented 2 years ago

Or here's the output of an example run where Terraform thinks it's okay, when it isn't.

aws_sns_topic_subscription.stream-out_sqs_target is stale and should be recreated, but isn't because the Provider is using GetSubscriptionAttributes which can return orphaned subscriptions. When a aws_sns_topic_subscription is reading state, its should (imo), be also leveraging ListSubscriptionsByTopic with the given TopicArn to verify it's not oprhaned, before assuming that everything is ok

github-actions[bot] commented 3 months ago

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

github-actions[bot] commented 1 month ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.