hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.6k stars 4.65k forks source link

Unable to modify an RBAC Owner-role with constraints: "CannotDeleteLastRbacAdminAssignment" error shown #26122

Open knuterik-ballestad opened 5 months ago

knuterik-ballestad commented 5 months ago

Is there an existing issue for this?

Community Note

Terraform Version

= 1.8.4

AzureRM Provider Version

~> 3.104.2

Affected Resource(s)/Data Source(s)

azurerm

Terraform Configuration Files

# Called in loop, once per subscription/LZ configuration:
resource "azurerm_role_assignment" "owner_role" {
  for_each             = var.sub_owners                                 # Typically *one* person
  scope                = "/subscriptions/${var.sub_id}"
  role_definition_name = "Owner"
  principal_id         = each.value
  # Inserted ARM template JSON for constraining the "Owner" role
  condition_version = "2.0"
  condition         = <<-EndOfInsertedJSON
    (
      (!(ActionMatches{'Microsoft.Authorization/roleAssignments/write'}))
      OR
      (@Request[Microsoft.Authorization/roleAssignments:RoleDefinitionId]
        ForAnyOfAnyValues:GuidEquals {${join(", ", values(local.owner_available_role_delegations)[*].role_id)}})
    )
    AND
    (
      (!(ActionMatches{'Microsoft.Authorization/roleAssignments/delete'}))
      OR
      (@Resource[Microsoft.Authorization/roleAssignments:RoleDefinitionId]
        ForAnyOfAnyValues:GuidEquals {${join(", ", values(local.owner_available_role_delegations)[*].role_id)}})
    )
  EndOfInsertedJSON
}

Debug Output/Panic Output

│ Error: authorization.RoleAssignmentsClient#Delete: Failure responding to request: StatusCode=412 -- Original Error: autorest/azure: Service returned an error. Status=412 Code="CannotDeleteLastRbacAdminAssignment" Message="Cannot delete the last RBAC admin assignment"

Expected Behaviour

We expect terraform to either be able to modify the role assignment or fail gracefully.

Actual Behaviour

Terraform removes Owner from all ALZ-subscriptions, and THEN fails in a state that doesn't even let us re-run TF to apply the role assignmens again.

Steps to Reproduce

image This image shows in the Portal what we are trying to do (adding or removing "Constrain roles"), meaning what roles the Owner are allowed to assign to others.

Important Factoids

No response

References

No response

knuterik-ballestad commented 5 months ago

Before upgrading to latest azurerm+terrform runtime, terraform failed "gracefully", allowing us to re-run the github action with the terraform apply - and then the role assignments was re-created with the correct, updated constraints.

magodo commented 5 months ago

@knuterik-ballestad Presumably, the loop in your case covers all the original owners, including the principal that is running terraform. Also, assuming your workspace has kept track of all these principals' states, when your change introduce a "replace", terraform will remove the role assignments prior creating the new ones. That's why you saw the error.

If above assumption holds, it looks like a "shoot yourself in the foot" case. My suggestion is to at least keep the principal that runs terraform not included in the sub_owners.

knuterik-ballestad commented 5 months ago

@knuterik-ballestad Presumably, the loop in your case covers all the original owners, including the principal that is running terraform. Also, assuming your workspace has kept track of all these principals' states, when your change introduce a "replace", terraform will remove the role assignments prior creating the new ones. That's why you saw the error.

If above assumption holds, it looks like a "shoot yourself in the foot" case. My suggestion is to at least keep the principal that runs terraform not included in the sub_owners.

Well, the principal that runs terraform, and certain admin users are set as Owners in the management structure, and not directly on the subscription - though the subscription inherits these Owners of course.

Our script only assigns one Owner directly to the subscription - the requester of a subscription to be created. That is why terraform has trouble when updating the Owner's constraint - because instead of just adding a constraint the whole role assignment is:

  1. Removed (this is where terraform fails)
  2. Re-applied with the additional constraint

So, if terraform could check also inherited Owners, and not only directly assigned, this would be solved.

simone-bennett commented 3 months ago

We also have this issue. The admin users are inherited so it's not the last user in the group.

magodo commented 3 months ago

Hey @simone-bennett, can you elaborate about your setup?

magodo commented 3 months ago

@knuterik-ballestad If the role assignment of your current principal is assigned to the management group, how will that incur a remove of that role in this new role assignment (on the sub)?

simone-bennett commented 3 months ago

Hey @simone-bennett, can you elaborate about your setup?

Sure thing. I've set this up in our dev tenant so I can add some screenshots.

We use terraform Azure Landing Zones and Subscription Vending. When we vend a new subscription owner rights are inherited from the root management group down to the new subscription.

image

image

In addition, we create a User Assigned Managed Identity for OIDC and grant it owner of the subscription.

When we try to destroy the subscription, I get the authorization.RoleAssignmentsClient#Delete: Failure responding to request: StatusCode=412 -- Original Error: autorest/azure: Service returned an error. Status=412 Code="CannotDeleteLastRbacAdminAssignment" Message="Cannot delete the last RBAC admin assignment error.

It's trying to delete the User Assigned Managed identity. This user assigned managed identity is the only directly assigned owner.

Also, it seems like the management group association is removed before the user assigned managed identity is deleted. Not sure if that is contributing. For eg: This is a subscription that failed to destory. It has been moved back to the tenant root but was unable to delete the User Assigned Managed Identity

image

If I go to the vended subscription, and add an owner directly there, we are able to destroy that subscription using the pipeline with no issues.

We obviously don't want to assign users directly and should be able to use entra groups at the management group level to grant access to subscriptions when they are created using IAC.

magodo commented 3 months ago

As @simone-bennett stated, the culprit lies in the azurerm_management_group_subscription_association:

Also, it seems like the management group association is removed before the user assigned managed identity is deleted.

The create order is like below:

management group ---+ 
                    +-> management group subscription association
subscription     ---+
     |
     +----------------> role assignments     

When it comes to delete, since there is no dependency between "role assignments" and "management group subscription association", they can happen concurrently. This makes the issue happen.

Ideally, there wants a dependency from the "management group subscription association" to "role assignments". With that, on deletion the "role assignments" will be deleted prior to unassociate the subscription and the management group, i.e. those inherited roles are still in the subscription, which then allows the deletion of the directly assigned roles in the sub.

I'm not an expert of ALZ, not sure if the change above will break anything else though...

simone-bennett commented 3 months ago

I should add, this is new. We have been deploying subscription vending for 12 months and I haven't come across this before now.

magodo commented 3 months ago

@simone-bennett This is like a race issue between (all of) azurerm_role_assignment and azurerm_management_group_subscription_association, where it only occurs when the azurerm_management_group_subscription_association is deleted prior to the azurerm_role_assignment.

Another factor is that the azurerm_role_assignment used to have caching issue, in that when it is deleted, sometimes it can still be queried.

knuterik-ballestad commented 3 months ago

@knuterik-ballestad If the role assignment of your current principal is assigned to the management group, how will that incur a remove of that role in this new role assignment (on the sub)?

The setup is as follows:

  1. We are 2 central admins that have subscription ownership inherited from the management groups
  2. For each subscription our vending machine creates, one person from the requesting development team is set as Owner directly on the created subscription - but we restrict that owner role with a filter listing the other roles he can assign to others in "his" subscription. This makes that person the only person that have ownership directly assigned at the subscription level.

Then, when we modify our role filter list in our terraform vending machine (adds or subtracts a role that this directly-assigned Owner are allowed to assign to others), terraform tries to completely remove+re-apply the ownership assignment. The "Remove owner" step then fails with an error message stating that the last owner of the subscription cannot be removed.

Also, this worked just fine up until recently, but after upgrading terraform, CAF version, terraform providers, ++ this error was introduced. We did not disover the bug during our testing of any of these upgrades since our test cases did not include modifying this Ownership "filter" list.

magodo commented 3 months ago

@knuterik-ballestad The condition is a ForceNew attribute since at least year 2021. So the " remove+re-apply the ownership assignment" is the behavior for a long term.

Per my test, as long as you have owner in this subscription (no matter directly assigned or inherited), you can remove the other owners.

E.g. I have the following role assignments in my sub:

image

I'm then able to delete the "magodotf" app role assignments:

image

So in your case given you have those 2 central admins, how do you get that error message?

Besides, I noticed that the condition is actually not necessary to be marked as ForceNew, as I find the Portal simply makes a PUT to do the update.

simone-bennett commented 3 months ago

It seems more that it wont accept the inherited admins from the root\highest level management group. You have to assign an admin directly on the subscription before deleting it. Even if there are admins that it inherits.

magodo commented 3 months ago

@simone-bennett Per my test (as shown above), the inherited admin can remove the last directly assigned admin from a subscription. Would you try out on your case and share the failure?