hashicorp / vault-plugin-secrets-azure

Vault Azure Secrets plugin
Mozilla Public License 2.0
26 stars 19 forks source link

vault azure dynamic engine can leak role assignments. #118

Closed dnozay closed 1 year ago

dnozay commented 1 year ago

There are 2 ways to use the azure engine wrt service principals:

  1. static AAD service principal - which is supplying the application object id.
  2. dynamic AAD service principal - which is you provide the role info, principal and role assignments get created.

So again, when using dynamic principals, the service principal is created, then a role assignment is done. Sometimes this fail, and can fail consistently.

How we found this issue:

╷
│ Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="RoleAssignmentLimitExceeded" Message="No more role assignments can be created."
│ 
│   with module.azure-service-principal-xxxxxxxx-xxxxx-xxxx.azurerm_role_assignment.service-principal-built-in-roles["/providers/Microsoft.Management/managementGroups/xxxxx-xxxxx-xxxx-xxxx-xxxxxxxxxx.Name of my Service"],
│   on modules/cross_subscription_service_principal/main.tf line 109, in resource "azurerm_role_assignment" "service-principal-built-in-roles":
│  109: resource "azurerm_role_assignment" "service-principal-built-in-roles" {
│ 
╵

I suspect maybe we misconfigured permanently_delete option or that the chosen ttl can be an issue with hitting our RoleAssignment quota before old objects are deleted / GCed.

However what can happen if you use a kubernetes deployment and for some reason that deployment is failing, each restart is going to create a new service principal and a new role assignment, this can also lead to resource exhaustion.

So an operator may fix the leak by going to the azure portal, checking role assignments, deleting old ones, etc.

image

When role unassignment is performed, if it fails it does not retry: https://github.com/hashicorp/vault-plugin-secrets-azure/blob/main/path_service_principal.go#L276-L296 This can also be a source of leaks.

austingebauer commented 1 year ago

Hi @dnozay! We recently addressed some of the leaking role assignment concerns in https://github.com/hashicorp/vault-plugin-secrets-azure/pull/110. Does this address your concerns? If not, specific steps to reproduce leaking role assignments would be helpful here. Thanks!

dnozay commented 1 year ago

@austingebauer - thanks for letting me know - I'll inform the team, got back from vacation recently.

austingebauer commented 1 year ago

@dnozay - I'm going to close this issue as we believe we've fixed this. Feel free to reopen this or a new issue if you've discovered otherwise!