hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.76k stars 9.12k forks source link

[Bug]: Error destroying aws_ssoadmin resources #33337

Closed novekm closed 9 months ago

novekm commented 1 year ago

Terraform Core Version

1.5.2

AWS Provider Version

5.15.0

Affected Resource(s)

aws_ssoadmin_permission_set

Expected Behavior

Successful destroy of resources

Actual Behavior

Failure/Error: Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Permission set provision not found in AWS account 123456789012.This is related to a fix that was merged in v5.14 but the issue persists. The only way to get past this error is to re-run terraform destroy a second time.

This led me to think through the possibility of adding a retry, as it seems terraform is attempting to destroy a resource that is already destroyed. Adjusting the new timeouts block has no effect, as the error occurs within 60sec in my testing.

Taking a look deeper into the permission_set.go file for the resource I found this:

Existing Code

if tfawserr.ErrCodeEquals(err, ssoadmin.ErrCodeResourceNotFoundException) {
        return diags
    }
if err != nil {
        return sdkdiag.AppendErrorf(diags, "deleting SSO Permission Set (%s): %s", permissionSetARN, err)
    }

From the docs about retries, it appears that this could be modified to retry if these errors occur instead of just returning the error message. I believe it could look something like this:

Potential New Code

if tfawserr.ErrCodeEquals(err, ssoadmin.ErrCodeResourceNotFoundException) {
        return retry.RetryableError(err, ssoadmin.ErrCodeResourceNotFoundException)
    }
if err != nil {
        return retry.RetryableError(diags, "deleting SSO Permission Set (%s): %s", permissionSetARN, err)
    }

I'd like to try to implement and submit the PR for the fix for this, as it seems it's been open for a while and multiple customers are having this issue. It is also a blocker for a module I created and am trying to release that manages AWS IAM Identity Center resources. I just haven't worked with retry logic in terraform before. Happy for any guidance on testing/implementing this fix.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_ssoadmin_permission_set" "pset" {
  for_each = var.permission_sets
  name     = each.key
  instance_arn     = local.ssoadmin_instance_arn
  description      = lookup(each.value, "description", null)
  relay_state      = lookup(each.value, "relay_state", null)      // (Optional) URL used to redirect users within the application 
  during the federation authentication process
  session_duration = lookup(each.value, "session_duration", null) // The length of time that the application user sessions are 
  valid in the ISO-8601 standard
  tags             = lookup(each.value, "tags", {})

  timeouts {
    update = "10m"
  }
}

Steps to Reproduce

  1. terraform apply
  2. terraform destroy

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

https://github.com/hashicorp/terraform-provider-aws/issues/23585

Would you like to implement a fix?

Yes

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

novekm commented 1 year ago

@justinretzolk I just submitted a PR for this that I think should fix it. Can someone review it when they get a chance?

jar-b commented 10 months ago

Hey @novekm - The error message you shared in the issue body is a failure which can only be present during an update operation for the aws_ssoadmin_permission_set resource.

Here is the function in which the waiting for SSO Permission Set message is constructed: https://github.com/hashicorp/terraform-provider-aws/blob/e313e8fce95a67a0b2ae4647a791915b3f2fa133/internal/service/ssoadmin/permission_set.go#L290-L308

And this is only referenced once during the update operation here:

https://github.com/hashicorp/terraform-provider-aws/blob/e313e8fce95a67a0b2ae4647a791915b3f2fa133/internal/service/ssoadmin/permission_set.go#L214-L217

Grepping through all of the SSO Admin resources it does look like some others (boundary_attachment, permission_set_inline_policy, customer_managed_policy_attachment, and managed_policy_attachment) call this function during delete operations, so its possible the fix proposed in #33384 is still valid, but needs to be applied to a different resource.

% rg provisionPermissionSet
internal/service/ssoadmin/permissions_boundary_attachment.go
122:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutCreate)); err != nil {
184:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {

internal/service/ssoadmin/permission_set_inline_policy.go
96:     if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutCreate)); err != nil {
161:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {

internal/service/ssoadmin/customer_managed_policy_attachment.go
108:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutCreate)); err != nil {
175:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {

internal/service/ssoadmin/managed_policy_attachment.go
101:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutCreate)); err != nil {
163:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {

internal/service/ssoadmin/permission_set.go
215:            if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutUpdate)); err != nil {
290:func provisionPermissionSet(ctx context.Context, conn *ssoadmin.SSOAdmin, permissionSetARN, instanceARN string, timeout time.Duration) error {

Are you able to provide a more complete configuration and/or logs to determine which resource this is failing on during the destroy step?

novekm commented 10 months ago

Hi @jar-b, thanks for taking a look into this! Like mentioned, the error listed above appears when running terraform destroy. I have just re-created the issue again in my account.

For context, I am using a TF module I created, but I believe the issue persists whether or not I use the module. It also appears others are having the same issue. I'm not sure if they are using a module or not, but it is not my module since it is not public yet, so a module-specific issue can likely be ruled out.

Here's my main.tf:

module "aws-iam-identity-center" {
  source = "./modules/aws-iam-identity-center" // local example

  // Create desired GROUPS in IAM Identity Center
  sso_groups = {
    Admin : {
      group_name        = "Admin"
      group_description = "Admin IAM Identity Center Group"
    },
    Dev : {
      group_name        = "Dev"
      group_description = "Dev IAM Identity Center Group"
    },
    QA : {
      group_name        = "QA"
      group_description = "QA IAM Identity Center Group"
    },
    Audit : {
      group_name        = "Audit"
      group_description = "Audit IAM Identity Center Group"
    },
  }

  // Create desired USERS in IAM Identity Center
  sso_users = {
    NarutoUzumaki : {
      group_membership = ["Admin", "Dev", "QA", "Audit"]
      user_name        = "nuzumaki"
      given_name       = "Naruto"
      family_name      = "Uzumaki"
      email            = "nuzumaki@hiddenleaf.village"
    },
    SasukeUchiha : {
      group_membership = ["QA", "Audit"]
      user_name        = "suchiha"
      given_name       = "Sasuke"
      family_name      = "Uchiha"
      email            = "suchiha@hiddenleaf.village"
    },
  }

  // Create permissions sets backed by AWS managed policies
  permission_sets = {
    AdministratorAccess = {
      description          = "Provides AWS full access permissions.",
      session_duration     = "PT4H", // how long until session expires - this means 4 hours. max is 12 hours
      aws_managed_policies = ["arn:aws:iam::aws:policy/AdministratorAccess"]
      tags                 = { ManagedBy = "Terraform" }
    },
    ViewOnlyAccess = {
      description          = "Provides AWS view only permissions.",
      session_duration     = "PT3H", // how long until session expires - this means 3 hours. max is 12 hours
      aws_managed_policies = ["arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"]
      tags                 = { ManagedBy = "Terraform" }
    },
  }

  // Assign users/groups access to accounts with the specified permissions
  account_assignments = {
    Admin : {
      principal_name  = "Admin"                                   // name of the user or group you wish to have access to the account(s)
      principal_type  = "GROUP"                                   // entity type (user or group) you wish to have access to the account(s)
      permission_sets = ["AdministratorAccess", "ViewOnlyAccess"] // permissions the user/group will have in the account(s)
      account_ids = [                                             // account(s) the group will have access to. Permissions they will have in account are above line
        local.account1_account_id,                                // locals are used to allow for global changes to multiple account assignments
        # local.account2_account_id, // if hard coding the account ids, you would need to change them in every place you want to change
        # local.account3_account_id, // these are defined in a locals.tf file, example is in this directory
        # local.account4_account_id,
      ]
    },
    Audit : {
      principal_name  = "Audit"            // name of the user or group you wish to have access to the account(s)
      principal_type  = "GROUP"            // entity type (user or group) you wish to have access to the account(s)
      permission_sets = ["ViewOnlyAccess"] // permissions the user/group will have in the account(s)
      account_ids = [                      // account(s) the group will have access to. Permissions they will have in account are above line
        local.account1_account_id,         // locals are used to allow for global changes to multiple account assignments
        # local.account2_account_id, // if hard coding the account ids, you would need to change them in every place you want to change
        # local.account3_account_id, // these are defined in a locals.tf file, example is in this directory
        # local.account4_account_id,
      ]
    },
  }

}

1. terraform apply - Apply completes successfully.

Apply complete! Resources: 19 added, 0 changed, 0 destroyed.

2. terraform destroy - fails, here is error message:

╷
│ Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx1) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found.
│ 
│ 
╵
╷
│ Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx2) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found.
│ 
│ 

I am not sure why the error mentions a "provision" when the resources are being destroyed. The only way to resolve the error currently is to run terraform destroy a second time. To address this in the PR I submitted, I added a retry logic if this error appears at all, since running terraform destroy a second time resolves it consistently.

Here is my TF_LOG="ERORR" output, let me know if the "DEBUG" would be more helpful::

2023-11-17T03:39:01.685-0500 [ERROR] provider.terraform-provider-aws_v5.26.0_x5: Response contains error diagnostic: diagnostic_summary="waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Permission set provision not found in AWS account xxxxxxxxxxxx." tf_proto_version=5.4 tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=xxx-xxx-xxx-xxx-xxx@caller=github.com/hashicorp/terraform-plugin-go@v0.19.1/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_detail= tf_resource_type=aws_ssoadmin_managed_policy_attachment @module=sdk.proto diagnostic_severity=ERROR tf_rpc=ApplyResourceChange timestamp=2023-11-17T03:39:01.685-0500
2023-11-17T03:39:01.693-0500 [ERROR] vertex "module.aws-iam-identity-center.aws_ssoadmin_managed_policy_attachment.pset_aws_managed_policy[\"ViewOnlyAccess.arn:aws:iam::aws:policy/job-function/ViewOnlyAccess\"] (destroy)" error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-72236a571cf03aa7/ps-baff34dc81e0f1c4) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Permission set provision not found in AWS account xxxxxxxxxxxx.
2023-11-17T03:39:01.715-0500 [ERROR] provider.terraform-provider-aws_v5.26.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_detail= diagnostic_severity=ERROR tf_proto_version=5.4 tf_resource_type=aws_ssoadmin_managed_policy_attachment @caller=github.com/hashicorp/terraform-plugin-go@v0.19.1/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_summary="waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found." tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=xxx-xxx-xxx-xxx-xxx tf_rpc=ApplyResourceChange timestamp=2023-11-17T03:39:01.715-0500
2023-11-17T03:39:01.721-0500 [ERROR] vertex "module.aws-iam-identity-center.aws_ssoadmin_managed_policy_attachment.pset_aws_managed_policy[\"AdministratorAccess.arn:aws:iam::aws:policy/AdministratorAccess\"] (destroy)" error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found.

3. After running terraform destroy a second time, it destroys the permission sets successfully:

Plan: 0 to add, 0 to change, 2 to destroy.
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["AdministratorAccess"]: Destroying... [id=arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx,arn:aws:sso:::instance/ssoins-xxx]
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["ViewOnlyAccess"]: Destroying... [id=arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx,arn:aws:sso:::instance/ssoins-xxx]
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["AdministratorAccess"]: Destruction complete after 0s
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["ViewOnlyAccess"]: Destruction complete after 0s

Destroy complete! Resources: 2 destroyed.

As another note, after re-applying and destroying again, this time only a single error message appears, instead of two errors as listed above:

╷
│ Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found.
│ 
│ 
╵

After a second terraform destroy again it succeeds, but mentions that 2 resources were destroyed:

Destroy complete! Resources: 2 destroyed.

Perhaps in addition to using the retry logic I added, the order the in which resources are destroyed could also be modified? Looking at the plan for my destroy, the permission sets are always the last items in the list. Maybe if the permissions sets were deleted first, then the error would likely not appear either? Let me know if you need any more detail. Thanks!

jar-b commented 10 months ago

Thanks for the extra detail. Inspecting the error logs, it looks like the two failing resource types during the first destroy are aws_ssoadmin_managed_policy_attachment (the ViewOnlyAccess and AdministratorAccess items specifically).

Since it's reaching the waiter step, this implies the detachment of the managed policies is successful, but the provisioning step which occurs after is failing.

https://github.com/hashicorp/terraform-provider-aws/blob/e313e8fce95a67a0b2ae4647a791915b3f2fa133/internal/service/ssoadmin/managed_policy_attachment.go#L152-L165

If you inspect the state after the first destroy (do not refresh), I'm guessing you'll see that both the permission set resources AND the managed policy attachment resources are still present. When the second terraform destroy is run, the state is refreshed prior to execution, and the read operation detects the managed policy attachments no longer exist and removes them from state. This happens before presenting the plan, which is why only the two permission set resources remain and go through cleanly.

https://github.com/hashicorp/terraform-provider-aws/blob/e313e8fce95a67a0b2ae4647a791915b3f2fa133/internal/service/ssoadmin/managed_policy_attachment.go#L117-L123

Seeing the full resource definition and terraform state list output at each step could confirm these assumptions. If this is the root cause, I suspect the provisioning step inside managed_policy_attachment.go is what actually needs to be adjusted to properly handle situations where the underlying permission set or instance ARN no longer exist.

novekm commented 10 months ago

Thanks for the additional context, that makes sense. Upon checking terraform.tfstate after the first destroy, you are correct that I still see resources there. The resources I see are:

Running terraform state list also shows the following:

❯ terraform state list
module.aws-iam-identity-center.data.aws_ssoadmin_instances.sso_instance
module.aws-iam-identity-center.aws_ssoadmin_managed_policy_attachment.pset_aws_managed_policy["ViewOnlyAccess.arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"]
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["AdministratorAccess"]
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["ViewOnlyAccess"]

What is odd to me is that it appears the AdminstratorAccess managed policy attachment is not present, so likely was deleted I'm assuming. However the other managed policy attachment (ViewOnlyAccess) is present, as well as the permission sets for both.

I will take a look at managed_policy_attachment.go but I am also more than welcome to feedback to update my PR to address what you have identified to be the root cause.

jar-b commented 10 months ago

Can you share the resource definitions in your module? The managed policy attachment and permission set resources would be most helpful, along with any other resources they reference. Or an equivalent standalone configuration that produces the same result is fine if you prefer not to share the module content at this time.

novekm commented 10 months ago

Sure, here they are:

aws_ssoadmin_managed_policy_attachment:

resource "aws_ssoadmin_managed_policy_attachment" "pset_aws_managed_policy" {
  # iterate over the permission_sets map of maps, and set the result to be pset_name and pset_index
  # ONLY if the policy for each pset_index is valid.
  for_each = { for pset in local.pset_aws_managed_policy_maps : "${pset.pset_name}.${pset.policy_arn}" => pset }

  instance_arn       = local.ssoadmin_instance_arn
  managed_policy_arn = each.value.policy_arn
  permission_set_arn = aws_ssoadmin_permission_set.pset[each.value.pset_name].arn
}

aws_ssoadmin_permission_set:

# - SSO Permission Set -
resource "aws_ssoadmin_permission_set" "pset" {
  for_each = var.permission_sets
  name = each.key

  # lookup function retrieves the value of a single element from a map, when provided it's key.
  # if the given key does not exist, the default value (null) is returned instead

  instance_arn     = local.ssoadmin_instance_arn
  description      = lookup(each.value, "description", null)
  relay_state      = lookup(each.value, "relay_state", null) // (Optional) URL used to redirect users within the application during the federation authentication process
  session_duration = lookup(each.value, "session_duration", null) // The length of time that the application user sessions are valid in the ISO-8601 standard
  tags             = lookup(each.value, "tags", {})
}

locals.tf:

# - Permission Sets and Policies -
locals {
  # - Fetch SSO Instance ARN and SSO Instance ID -
  ssoadmin_instance_arn = tolist(data.aws_ssoadmin_instances.sso_instance.arns)[0]
  sso_instance_id = tolist(data.aws_ssoadmin_instances.sso_instance.identity_store_ids)[0]

  # Iterate over the objects in var.permission sets, then evaluate the expression's 'pset_name'
  # and 'pset_index' with 'pset_name' and 'pset_index' only if the pset_index.managed_policies (AWS Managed Policy ARN)
  # produces a result without an error (i.e. if the ARN is valid). If any of the ARNs for any of the objects
  # in the map are invalid, the for loop will fail.

  # pset_name is the attribute name for each permission set map/object
  # pset_index is the corresponding index of the map of maps (which is the variable permission_sets)
  aws_managed_permission_sets = { for pset_name, pset_index in var.permission_sets : pset_name => pset_index if can(pset_index.aws_managed_policies) }
  customer_managed_permission_sets = { for pset_name, pset_index in var.permission_sets : pset_name => pset_index if can(pset_index.customer_managed_policies) }

  #  ! NOT CURRENTLY SUPPORTED !
  # inline_policy_permission_sets = { for pset_name, pset_index in var.permission_sets : pset_name => pset_index if can(pset_index.inline_policy) }

  # When using the 'for' expression in Terraform:
  # [ and ] produces a tuple
  # { and } produces an object, and you must provide two result expressions separated by the => symbol
  # The 'flatten' function takes a list and replaces any elements that are lists with a flattened sequence of the list contents

  # create pset_name and managed policy maps list. flatten is needed because the result is a list of maps.name
  # This nested for loop will run only if each of the managed_policies are valid ARNs.

  # - AWS Managed Policies -
  pset_aws_managed_policy_maps = flatten([
    for pset_name, pset_index in local.aws_managed_permission_sets : [
      for policy in pset_index.aws_managed_policies : {
        pset_name  = pset_name
        policy_arn = policy
      } if pset_index.aws_managed_policies != null && can(pset_index.aws_managed_policies)
    ]
  ])

  # - Customer Managed Policies -
  pset_customer_managed_policy_maps = flatten([
    for pset_name, pset_index in local.customer_managed_permission_sets : [
      for policy in pset_index.customer_managed_policies : {
        pset_name  = pset_name
        policy_name = policy
        # path = path
      } if pset_index.customer_managed_policies != null && can(pset_index.customer_managed_policies)
    ]
  ])

  #  ! NOT CURRENTLY SUPPORTED !
  # - Inline Policy -
  #   pset_inline_policy_maps = flatten([
  #     for pset_name, pset_index in local.inline_policy_permission_sets : [
  #       for policy in pset_index.inline_policy : {
  #         pset_name  = pset_name
  #         inline_policy = policy
  #         # path = path
  #       } if pset_index.inline_policy != null && can(pset_index.inline_policy)
  #     ]
  #   ])

}

I can also create a new standalone configuration and post that here if needed

jar-b commented 10 months ago

Thanks - A minimal configuration would be helpful as it can be re-used for an acceptance test.

novekm commented 10 months ago

Minimal configuration:

# Fetch existing SSO Instance
data "aws_ssoadmin_instances" "sso_instance" {}

locals {
  # - Fetch SSO Instance ARN and SSO Instance ID -
  ssoadmin_instance_arn = tolist(data.aws_ssoadmin_instances.sso_instance.arns)[0]
  sso_instance_id       = tolist(data.aws_ssoadmin_instances.sso_instance.identity_store_ids)[0]
}

#  Create IAM IDC Group
resource "aws_identitystore_group" "example" {

  identity_store_id = local.sso_instance_id
  display_name      = "Admin"
  description       = "Admin Group"
}

# Create IAM IDC User
resource "aws_identitystore_user" "example" {
  identity_store_id = local.sso_instance_id
  display_name      = "Naruto Uzumaki"
  user_name         = "nuzumaki"
  name {
    given_name  = "Naruto"
    family_name = "Uzumaki"
  }
  emails {
    value   = "nuzumaki@hokage.village"
    primary = true
  }
}

# Create IAM IDC Group Membership
resource "aws_identitystore_group_membership" "sso_group_membership" {
  identity_store_id = local.sso_instance_id
  group_id  = aws_identitystore_group.example.group_id
  member_id = aws_identitystore_user.example.user_id
}

# Create Permission Set
resource "aws_ssoadmin_permission_set" "example" {
  name = "ExamplePermissionSet"
  instance_arn     = local.ssoadmin_instance_arn
  description      = "ExamplePermissionSet"
  session_duration = "PT3H"
}

# Create Managed Policy Attachment
resource "aws_ssoadmin_managed_policy_attachment" "pset_aws_managed_policy" {
  instance_arn       = local.ssoadmin_instance_arn
  managed_policy_arn = "arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"
  permission_set_arn = aws_ssoadmin_permission_set.example.arn
}

# Create Account Assignment
resource "aws_ssoadmin_account_assignment" "account_assignment" {
  instance_arn       = local.ssoadmin_instance_arn
  permission_set_arn = aws_ssoadmin_permission_set.example.arn

  principal_id   = aws_identitystore_group.example.group_id
  principal_type = "GROUP"

  target_id   = "000000000000"
  target_type = "AWS_ACCOUNT"
}

1. terraform apply:

Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

2. terraform destroy error:

╷
│ Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Permission set provision not found in AWS account 000000000000.
│ 
│ 
╵

3. re-run terraform destroy:

Plan: 0 to add, 0 to change, 1 to destroy.
aws_ssoadmin_permission_set.example: Destroying... [id=arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx,arn:aws:sso:::instance/ssoins-xxx]
aws_ssoadmin_permission_set.example: Destruction complete after 0s

Destroy complete! Resources: 1 destroyed.

Same issue is happening with simplified configuration as well.

jar-b commented 10 months ago

Thanks @novekm - I was able to reproduce with the configuration above.

Reproduction and Cause

My current understanding of the issue is that the deletion of both the managed policy attachment and account assignment simultaneously causes problems when the Delete operation of the policy attachment attempts to re-provision the permission set:

https://github.com/hashicorp/terraform-provider-aws/blob/cc558c7f6cada861f79b82942ca01e084cccf893/internal/service/ssoadmin/managed_policy_attachment.go#L162-L165

Because the account assignment no longer exists, the provision step fails with an error like:

Received a 404 status error: Permission set provision not found in AWS account 012345678901.

Solution

I was able to resolve this by creating an explicit dependency between the two resources using the depends_on meta argument. You can add this to either resource (but not both) and destroy should complete in one pass. Here is an example of the modified managed policy attachment resource.

resource "aws_ssoadmin_managed_policy_attachment" "pset_aws_managed_policy" {
  depends_on = [aws_ssoadmin_account_assignment.account_assignment]

  instance_arn       = local.ssoadmin_instance_arn
  managed_policy_arn = "arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"
  permission_set_arn = aws_ssoadmin_permission_set.example.arn
}

Because of this explicit dependency, the destroy operation will completely destroy aws_ssoadmin_managed_policy_attachment before beginning destruction of the account assignment (the inverse of the apply order). This allows the re-provisioning of the permission set to complete successfully before the account assignment is removed. More importantly, this means a clean destroy in one pass 👍 .

Provider Impact

At this time I'd propose not make provider side changes to ignore or retry this particular error. This appears to be a function of the relationship between the account assignment and managed policy attachment when destruction of both is triggered simultaneously. The meaning of this error could change depending on the combination of resources being destroyed, so suppressing it could result in incorrect behavior under other conditions. Resolution of the issue with an explicit depends_on argument also factors in, as it allows the impacted configuration to function correctly with no provider changes.

Please let us know if you have any concerns resolving the original issue with this approach.

novekm commented 10 months ago

Thanks @jar-b for the detailed response! I will try the adding depends_on and see if it also works on my end, will keep you posted. If all works, I'll submit a PR with an update to the docs for the resource that explicitly lists this current limitation/the current resolution.

novekm commented 10 months ago

Hi @jar-b! Sorry for the delay, last week was quite busy with re:Invent :) I have tested your recommendation and can confirm it resolves the error for me. I have tested both with the simplified configuration I posted above, and also within a module I created. The two affected resources are indeed aws_ssoadmin_managed_policy_attachment and aws_ssoadmin_account_assignment - the other policy attachments seemed to work fine without adding the depends_on meta-argument.

I have created a PR - # 34751 that updates the public docs for these resources, adding clear documentation on the error and resolution. I have submitted many docs for the AWSCC provider, but this is my first for the AWS provider. It seems is uses a different format/structure in the repo. Let me know if the PR needs to be updated. Thanks again for the help resolving this! This will help many customers.

github-actions[bot] commented 9 months ago

This functionality has been released in v5.30.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions[bot] commented 8 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.