hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.13k stars 4.21k forks source link

Documentation Tweak: Azure Secrets Engine - Detail the required AAD API Permissions #8286

Closed chrisreddington closed 1 year ago

chrisreddington commented 4 years ago

The Azure Secrets Engine API documentation (here) could include some additional information about the permissions needed to create Dynamic Secrets, AKA service principals. After hunting around, the required permissions are documented in Hashicorp Learn here.

It would be excellent to have the same steps on creating a service principal also in the documentation (just before writing to the azure/config), so that it becomes a quick and easy guide to setting up the Azure Secrets Engine.

ausfestivus commented 1 year ago

As of today, the permissions documented at https://developer.hashicorp.com/vault/docs/secrets/azure#ms-graph-api-permissions are still incorrect. They were last updated in https://github.com/hashicorp/vault/pull/18937. The intent behind that PR was good, reducing the number of API Permissions required by the App Registration is a worthy goal.

The above PR missed one API Permission though and that is the Directory.ReadWrite.All. I suspect that in the testing that @austingebauer did for the above PR, the engine was not configured to include any Vault Roles. Im betting that if Austin repeated the testing AND added Vault Roles to the engine that mapped to Azure Roles that the configuration would fail. Here is the output from our TFE that manages our Vault config for this engine:

Error: Error writing Azure Secret role "acl_cpt/cpt_XXXX/roles/00000000-0000-0000-0000-000000000000-contributor": Error making API request.

URL: PUT https://vault.nnnn.nnn/v1/acl_cpt/cpt_XXXX/roles/00000000-0000-0000-0000-000000000000-contributor
Code: 500. Errors:

* 1 error occurred:
    * unable to lookup Azure role: authorization.RoleDefinitionsClient#List: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '00000000-0000-0000-0000-000000000000' with object id '00000000-0000-0000-0000-000000000000' does not have authorization to perform action 'Microsoft.Authorization/roleDefinitions/read' over scope '/subscriptions/00000000-0000-0000-0000-000000000000' or the scope is invalid. If access was recently granted, please refresh your credentials."

   with vault_azure_secret_backend_role.this["cpt_XXXX.00000000-0000-0000-0000-000000000000-contributor"],
   on secrets_engines.tf line 55, in resource "vault_azure_secret_backend_role" "this":
   55: resource "vault_azure_secret_backend_role" "this" {

Once the Directory.ReadWrite.All API Permission was added to the App Registration, Vault and the Azure Secrets Engine were able to configure the Vault Roles correctly.

We've opened a support case about this issue as well. Ref#114268.

For added valueconfusion, the tutorial pages for the Azure Secrets Engine list a much more extensive list of required API Permissions. See https://developer.hashicorp.com/vault/tutorials/secrets-management/azure-secrets?variants=vault-deploy%3Aoss#create-an-azure-service-principal-and-resource-group (steps 21 and 23).

austingebauer commented 1 year ago

Thanks for letting us know, @ausfestivus! We're clearly missing something for your use case. We'll get this figured out and update the documentation (+ tutorial pages).

In my testing, I did successfully create Vault roles that map to Azure roles. I was able to use those to generate credentials with success. Additionally, we have tests in the Azure secrets engine repository that use the permissions we've recommended. You can see in my pull request that our tests ran successfully.

Are you able to provide any more detail on your Vault role configuration? The error message you've shared is interesting. It seems to show that "The provided client secret keys for app '00000000-0000-0000-0000-000000000000' are expired." and not an authorization issue due to wrong permissions. Happy to help with more information!

ausfestivus commented 1 year ago

"The provided client secret keys for app '00000000-0000-0000-0000-000000000000' are expired."

I may have inserted the wrong error above, let me revisit and confirm. Standby.

Yea, confirmed, ive inserted the wrong error message above. 🙄 Doing too many things at once. Ill fish-out and sanitise the actual error and update the above comment shortly.

ausfestivus commented 1 year ago

Are you able to provide any more detail on your Vault role configuration?

I will provide what I can that may assist Hashicorp to refine their tests.

We use TFE to govern the Vault configuration. The Azure Secrets Engine configurations are defined in code and the configuration values are defined in a .tfvars file. We have the following Resources defined in Terraform:

# azure secrets engine
resource "vault_azure_secret_backend" "this" {
  for_each = var.azure_secret_backend

  path = "${var.group_path_prefix}/${each.value.path}"

  subscription_id = each.value.subscription_id
  tenant_id       = each.value.tenant_id
  client_id       = each.value.client_id

  environment             = "AzurePublicCloud"
  use_microsoft_graph_api = true
}

# roles in the above secrets engine
resource "vault_azure_secret_backend_role" "this" {
  for_each = {
    for role in local.azure_role : "${role.backend_key}.${role.role_key}" => role
  }

  backend = vault_azure_secret_backend.this[each.value.backend_key].path
  role    = each.value.role
  ttl     = each.value.ttl
  max_ttl = each.value.max_ttl

  azure_roles {
    role_name = each.value.azure_roles.role_name
    scope     = each.value.azure_roles.scope
  }
}

From locals.tf:

  azure_role = flatten([
    for backend_key, backend in var.azure_secret_backend : [
      for role_key, role in backend.roles : {
        backend_key = backend_key
        role_key    = role_key
        backend     = backend.path
        role        = role.name
        ttl         = role.ttl
        max_ttl     = role.max_ttl
        azure_roles = role.azure_roles
      }
    ]
  ])

And, the value of variable azure_secret_backend with real values redacted:

azure_secret_backend = {
  cpt_XXXX = {
    path            = "cpt_XXXX"
    subscription_id = "00000000-0000-0000-0000-000000000000"
    tenant_id       = "00000000-0000-0000-0000-000000000000"
    client_id       = "00000000-0000-0000-0000-000000000000"

    # roles for cpt_XXXX
    roles = {
      # Roles for Subscriptions with Dynamic SPs
      "00000000-0000-0000-0000-000000000000-owner" = {
        name    = "00000000-0000-0000-0000-000000000000-owner"
        ttl     = 1200
        max_ttl = 2400
        azure_roles = {
          role_name = "Owner"
          scope     = "/subscriptions/00000000-0000-0000-0000-000000000000"
        }
      }
      "00000000-0000-0000-0000-000000000000-contributor" = {
        name    = "00000000-0000-0000-0000-000000000000-contributor"
        ttl     = 1200
        max_ttl = 2400
        azure_roles = {
          role_name = "Contributor"
          scope     = "/subscriptions/00000000-0000-0000-0000-000000000000"
        }
      }
      "00000000-0000-0000-0000-000000000000-reader" = {
        name    = "00000000-0000-0000-0000-000000000000-reader"
        ttl     = 1200
        max_ttl = 2400
        azure_roles = {
          role_name = "Reader"
          scope     = "/subscriptions/00000000-0000-0000-0000-000000000000"
        }
      }
    }
    staticRoles = {}
  }
}

If you're able to cobble together a prototype from the above for your test case, the repro steps are then:

When you think about it, it makes sense. Vault wants to use the AAD API to read the Azure Role definitions. These exist in the DIrectory. Without at least Directory.Read.All in the API Permissions then Vault wont be able to read the role definition in AAD. At least, thats what it looks like to me.

In the testing Ive done so far today, ive only completed the configuration of the Azure Secret Engine's configuration. The following MS Graph API Permissions were granted to the App Registration. This allowed the engine configuration to complete successfully.

Permission Name Type
Application.ReadWrite.All Application
Directory.ReadWrite.All Application
Group.ReadWrite.All Application

My next step will be to scope out the work required to do the end-to-end testing of the engines full configuration and capability. I suspect that is a large body of work for us. I will be asked why Hashicorp shouldnt be the one's to do this work since it is their product. It's a fair question.

This isnt the first time we've encountered issues with the API Permissions required for the App Registration to function correctly. I do recall other issues/conversations about the Hashicorp documentation describing permissions requirements that could be considered too permissive. I know the question has been asked a few times internally here by the identity and security teams.

ausfestivus commented 1 year ago

Quick Q @austingebauer do the tests for the Vault Role definition do the mapping via role_name or role_id or both?

Asking because, if you look at the doco here it states:

  1. If role_id is provided, it is validated and the corresponding role_name updated.
  2. If only role_name is provided, a case-insensitive search-by-name is made, succeeding only if exactly one matching role is found. The role_id field will updated with the matching role ID.

In our use-case, we are using the role_name. This means that the engine needs to look the name up in AAD to retrieve the role_id. If the app registration it uses to access AAD doesnt have at least Directory.Read.All that look up will likely fail.

Thats my gut feel anyway.

austingebauer commented 1 year ago

The tests use role_name. You can see this at path_service_principal_test.go#L942. You can see the Terraform config we're using to provide the configuration to the secrets engine in tests. It doesn't have the Directory.ReadWrite.All permission but still works with role_name.

Are you able to provide the actual error you're seeing? You mentioned the error you shared before was incorrect. I'm curious what the error looks like when you don't have Directory.ReadWrite.All set. Thanks, we'll get it figured out!

ausfestivus commented 1 year ago

Are you able to provide the actual error you're seeing?

I updated the error in the comment above to display the correct error.

austingebauer commented 1 year ago

Thanks @ausfestivus! It's still not clear to me why Directory.ReadWrite.All is needed in your case vs. what we've been testing with 🤔 Do you happen to have any other ideas on what the difference might be?

ausfestivus commented 1 year ago

I wonder if its related to default API Permissions in an AAD Tenant versus our AAD Tenants configuration. It's entirely possible that the AAD Tenant here has had a default config item changed. I will invite our identity team to engage on this issue and see what they can share.

ausfestivus commented 1 year ago

I am still attempting to move this forward at our end.

austingebauer commented 1 year ago

Thanks, @ausfestivus! We have others looking at this internally as well. We're trying to understand the exact set of permissions to recommend and figure out why our tests pass without Directory.ReadWrite.All.

ausfestivus commented 1 year ago

Well, to make matters more interesting, we have removed the Directory.ReadWrite.All API Permission from the App Registration. Things are working fine with that App Registration now. I dont know how or why this situation has come about.

I have detailed work journal entries for the whole thing. The order of events went something like this:

Following the above response from HC stating that the Engine's testing had passed with the described API Permissions in the doco:

I am now in the process of setting up an additional Engine for testing using a new App Registration supplied by the identity team. I want to see if the change which adds the Directory.ReadWrite.All API Permission has any bearing on the result.

Additionally, while reviewing the work journal notes, ive noted something odd about the original error message:

│   * unable to lookup Azure role: authorization.RoleDefinitionsClient#List: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '7e4c86b7-0000-0000-0000-000000000000' with object id '7e4c86b7-0000-0000-0000-000000000000' does not have authorization to perform action 'Microsoft.Authorization/roleDefinitions/read' over scope '/subscriptions/9588a465-0000-0000-0000-000000000000' or the scope is invalid. If access was recently granted, please refresh your credentials."

The object id in the error is NOT the object id from the App Registration. It is the object id from the App Registration's Enterprise Application object in the directory. I cannot understand WHY the Vault engine would try and use the object id for the Enterprise Application object instead of the App Registration object id.

I may come back to consider this depending on the results of the testing with the new, 2nd App Registration.

ausfestivus commented 1 year ago

I am unable to reproduce the original error with the 2nd App Registration.

My earlier assertion that the API Permissions listed in the Engine documentation at https://developer.hashicorp.com/vault/docs/secrets/azure#ms-graph-api-permissions is incorrect and should be ignored.

The API Permissions documentation in the Vault Learn portal here (as of today) are still different from the permissions described in the Engine's documentation above.

austingebauer commented 1 year ago

Thank you for the updates, @ausfestivus. I'm glad you were able to get things working with the permissions set we recommend. I will open a pull request to update the Vault learn material so that it matches our other Vault docs.

austingebauer commented 1 year ago

I'm going to close this issue as the Vault docs and learn tutorial will have matching permissions recommendations shortly. Thanks, everyone!