hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.53k stars 4.6k forks source link

Incorrect destroy order when working with Azure Policies in a set #6072

Open ghost opened 4 years ago

ghost commented 4 years ago

This issue was originally opened by @erikpaasonen as hashicorp/terraform#24351. It was migrated here as a result of the provider split. The original body of the issue is below.


Terraform Version

Terraform v0.12.23
+ provider.azurerm v1.44.0

Terraform Configuration Files

https://gist.github.com/erikpaasonen/cf0670ca533d81851b875d76533696ad

This is simply the sample code for the azurerm_policy_definition and azurerm_policy_set_definition resources of the AzureRM provider adapted to work with each other.

Debug Output

https://gist.github.com/erikpaasonen/d097156afe8762db947b7c9bf0f01bcd

Crash Output

N/A

Expected Behavior

When the azurerm_policy_definition is part of an azurerm_policy_set_definition and the policy definition needs to be deleted/recreated to perform a modification, the policy should be detached/removed from the set prior to being deleted.

Actual Behavior

The policy deletion is always attempted first, resulting in the following error message:

Error: Error deleting Policy Definition "accTestPolicy": policy.DefinitionsClient#Delete: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidDeletePolicyDefinitionRequest" Message="The policy definition 'accTestPolicy' cannot be deleted. It is referenced by the policy set definition '/subscriptions/aadd890f-876d-4390-9d6e-a5450e9eacb7/providers/Microsoft.Authorization/policySetDefinitions/testPolicySet'. Please remove this policy definition from all policy set definitions that reference it."

Steps to Reproduce

  1. terraform apply
  2. make a change to the policy, such as modifying the name
  3. terraform apply

Additional Context

The bug report template for new issues in the github repo for the azurerm provider specifically calls out resource ordering issues as belonging instead in Terraform core.

References

fgarcia-cnb commented 3 years ago

after almost a year, this is still an issue. I tried using the "create_before_destroy" option within the policy resource as described in hashicorp/terraform#24351, but still fails when the set of policies decreases/rebuilds. oddly enough a full destroy of the workspace works just fine. are there any workarounds? otherwise this makes the policy_set resource pretty much useless.

fgarcia-cnb commented 3 years ago

i was able to workaround this issue by creating a destroy time provisioner for the policy. on destroy, the policy resource runs a bash script that removes the policy from the the policy set definition (using jq). not ideal, but it works

however, a new issue similar to #8663 has reared its ugly head. if i destroy a policy that had "parameter_values" set to a value, all subsequent dynamic blocks are shifted up one level to replace that deleted policy. if the shifted policy block has parameter_values set to null or "", terraform does not correctly update the value. the value is properly updated if "{}" is used, but then subsequent applys keep trying to replace the value over and over again.

azurerm v2.0.48 terraform v0.12.24

//Example showing where parameter_values is set to null or empty string
//this policy is being deleted and replaced by the next policy in the list
         ~ policy_definition_reference {
            //parameter_values should be cleared, but instead is keeping the old value. 
            //this will cause an error since the next policy doesnt accept parameters
            parameter_values     = jsonencode(
                {
                    namePattern = {
                        value = "[parameters('namePattern')]"
                    }
                }
            )
            parameters           = {
                "namePattern" = "[parameters('namePattern')]"
            }
          //correctly updating to the policy ID that followed it
          ~ policy_definition_id = "/subscriptions/<HIDDEN>/providers/Microsoft.Authorization/policyDefinitions/INFRA-TEST-VM-Naming-Convention3" -> "/providers/Microsoft.Authorization/policyDefinitions/0015ea4d-51ff-4ce3-8d8c-f3f8f0179a56"
            policy_group_names   = []
            reference_id         = "14700550802179895864"
        }
//this policy is being shifted up one
      - policy_definition_reference {
          - parameters           = {} -> null
          - policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/0015ea4d-51ff-4ce3-8d8c-f3f8f0179a56" -> null
          - policy_group_names   = [] -> null
          - reference_id         = "3028661221033634618" -> null
        }

with "{}" as the value for parameter_values, it updates correctly and applies fine...

      ~ policy_definition_reference {
         //correctly updating parameter_values to empty json
          ~ parameter_values     = jsonencode(
              ~ {
                  - namePattern = {
                      - value = "[parameters('namePattern')]"
                    } -> null
                }
            )
            parameters           = {
                "namePattern" = "[parameters('namePattern')]"
            }
          ~ policy_definition_id = "/subscriptions/<HIDDEN>/providers/Microsoft.Authorization/policyDefinitions/INFRA-TEST-VM-Naming-Convention3" -> "/providers/Microsoft.Authorization/policyDefinitions/0015ea4d-51ff-4ce3-8d8c-f3f8f0179a56"
            policy_group_names   = []
            reference_id         = "14700550802179895864"
        }
      - policy_definition_reference {
          - parameters           = {} -> null
          - policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/0015ea4d-51ff-4ce3-8d8c-f3f8f0179a56" -> null
          - policy_group_names   = [] -> null
          - reference_id         = "3028661221033634618" -> null
        }
    }

but subsequent applies keep trying the same change...

      ~ policy_definition_reference {
          + parameter_values     = jsonencode({})
            parameters           = {}
            policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/0015ea4d-51ff-4ce3-8d8c-f3f8f0179a56"
            policy_group_names   = []
            reference_id         = "14700550802179895864"
        }
fgarcia-cnb commented 3 years ago

darn, unfortunately destroy provisioners run at the same time, so f youre deleting 2 policies their is a race condition on updating the policy set. i cant think of a way to make them run sequentially. im trying a random sleep at the beginning of the script.

that said, there is also a big issue if you have a policy set with only one policy. any changes to that policy that cause a rebuild will fail with:

{"error":{"code":"InvalidCreatePolicySetDefinitionRequest","message":"The policy set definition 
'INFRA-TEST-Initiative3' create request is invalid. At least one policy definition must be referenced."}}

since terraform will try deleting the only existing policy before adding the new one. the new one has to be created first, and then update the policy set to point to the new item

fgarcia-cnb commented 3 years ago

i had to abandon the policy_set_definition resource altogether because it has so many shortcomings. the only way i got this to fully work is to create 2 scripts:

  1. a null resource script that creates/recreated/updates the policy set every time my list of policies changes (using jsonencode as a trigger)
  2. an "on destroy" provisioner script for policies. when a policy is destroyed, the policy set needs to be destroyed as well so that all policies are unlocked to be modified (including the 1 policy scenario)
fgarcia-cnb commented 3 years ago

any plans to fix this? the policy_set resource is basically useless as is. im locked in to tf 0.12 since destroy provisioners lack support in later versions.

tullydwyer commented 2 years ago

Hi all, we are hitting the same issue using policy_set. I agree with fgarcia-cnb policy_set is severely broken for a production environment, this is because you cannot change the name, re-create, or remove (delete) any policy referenced by the policy_set, and there seems to be no workarounds with Terraform.

The only solution seems to be manually deleting the policies. Which is not good when Terraform is managing a large amount of policies.

Workarounds tried:

fgarcia-cnb commented 2 years ago

please, why is no one looking at this? this resource is pretty much useless, and my workaround using destroy provisioners only works with tf12.

fgarcia-cnb commented 2 years ago

how has azurerm v3 been released, with new policy related resources, but this basic issue is still not resolved? this resource is still completely unusuable. There needs to be a way to delete a policy assigned to a policy set. you cant remove it from the policy set first because policy set creations have implicit dependencies on policies.

Adam-McGill commented 1 year ago

Looks like this is still an issue and not being addressed. Creates a lot of work when removing resources.

soaand01 commented 1 year ago

Still an issue

michealmueller commented 1 year ago

still an issue, dealing with it now, terraform is not announcing that the policies are not able to be deleted. it just ignores them.... until i create a new RG and it errors b/c the policy exists

any timeline or planed fix for this ?

scdubay commented 7 months ago

I am abandoning the policy_set_definition resource as well... not ready for prod ...at... all...

Tolbin400 commented 5 months ago

Any progress on this issue?