pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
126 stars 33 forks source link

Unable to Encrypt Databricks Workspace in one `pulumi up` #2617

Open ahmed-wajid opened 1 year ago

ahmed-wajid commented 1 year ago

What happened?

When creating a Databricks workspace using the Azure Native v2 Provider, there is no straight forward to encrypt the workspace.

It seems that this can only with a two step approach: First pulumi upwith:

workspace = databricks.Workspace(
    "foobar",
    managed_resource_group_id="/subscriptions/<subscriptionid>/resourceGroups/mg_rg_foobar",
    resource_group_name=resource_group.name,
    sku=databricks.SkuArgs(
        name="premium",
        tier="premium"
    ),
    parameters=databricks.WorkspaceCustomParametersArgs(
        prepare_encryption=databricks.WorkspaceCustomBooleanParameterArgs(
            value=True,
        ),
    ),
)

roleassignment = authorization.RoleAssignment(
    f"Storage Account KeyVault Crypto User - {workspace.storage_account_identity}",
    principal_id=workspace.storage_account_identity.principal_id,
    principal_type=authorization.PrincipalType.SERVICE_PRINCIPAL,
    role_definition_id="/providers/Microsoft.Authorization/roleDefinitions/12338af0-0e69-4776-bea7-57ae8d297424",
    scope="/subscriptions/<subscriptionid>/resourceGroups/resource_groupe8a30786/providers/Microsoft.KeyVault/vaults/foobar",
)

This creates the workspace, and then does the role assignment to the Storage Account Identity, providing access to the KeyVault with the roleassignment of the KeyVault Crypto User.

To finally encrypt it, we need to a second pulumi up with:

workspace = databricks.Workspace(
    "foobar",
    managed_resource_group_id="/subscriptions/<subscriptionid>/resourceGroups/mg_rg_foobar",
    resource_group_name=resource_group.name,
    sku=databricks.SkuArgs(
        name="premium",
        tier="premium"
    ),
    encryption=databricks.WorkspacePropertiesEncryptionArgs(
        entities=databricks.EncryptionEntitiesDefinitionArgs(
            managed_disk=databricks.ManagedDiskEncryptionArgs(
                key_source="Microsoft.Keyvault",
                rotation_to_latest_key_version_enabled=False,
                key_vault_properties=databricks.ManagedDiskEncryptionKeyVaultPropertiesArgs(
                    key_name="foobar",
                    key_version="<keyversion>",
                    key_vault_uri="https://foobar.vault.azure.net/"
                ),
            ),
            managed_services=databricks.EncryptionV2Args(
                key_source="Microsoft.Keyvault",
                key_vault_properties=databricks.EncryptionV2KeyVaultPropertiesArgs(
                    key_name="foobar",
                    key_version="<keyversion>",
                    key_vault_uri="https://foobar.vault.azure.net/"
                ),
            ),
        ),
    ),
    parameters=databricks.WorkspaceCustomParametersArgs(
        encryption=databricks.WorkspaceEncryptionParameterArgs(
            value=databricks.EncryptionArgs(
                key_name="foobar",
                key_source="Microsoft.Keyvault",
                key_vault_uri="https://foobar.vault.azure.net/",
                key_version="<keyversion>"
            ),
        ),
        prepare_encryption=databricks.WorkspaceCustomBooleanParameterArgs(
            value=True,
        ),
    ),
)

We are actually using Automation API running this in a celery task, and seems that we cannot find a way to achieve this using the Azure Native Provider.

Would you happen to have any suggestions or tricks up your sleeve where you would have potentially already solved such a problem?

Expected Behavior

Expect to be able to Encrypt the Databricks Workspace in one go.

Steps to reproduce

  1. Do the prereqs.
  2. Add the following to main,py
    
    workspace = databricks.Workspace(
    "foobar",
    managed_resource_group_id="/subscriptions/<subscriptionid>/resourceGroups/mg_rg_foobar",
    resource_group_name=resource_group.name,
    sku=databricks.SkuArgs(
        name="premium",
        tier="premium"
    ),
    parameters=databricks.WorkspaceCustomParametersArgs(
        prepare_encryption=databricks.WorkspaceCustomBooleanParameterArgs(
            value=True,
        ),
    ),
    )

roleassignment = authorization.RoleAssignment( f"Storage Account KeyVault Crypto User - {workspace.storage_account_identity}", principal_id=workspace.storage_account_identity.principal_id, principal_type=authorization.PrincipalType.SERVICE_PRINCIPAL, role_definition_id="/providers/Microsoft.Authorization/roleDefinitions/12338af0-0e69-4776-bea7-57ae8d297424", scope="/subscriptions//resourceGroups/resource_groupe8a30786/providers/Microsoft.KeyVault/vaults/foobar", )

3. `pulumi up`
4. Change the content in __main__.py to:

workspace = databricks.Workspace( "foobar", managed_resource_group_id="/subscriptions//resourceGroups/mg_rg_foobar", resource_group_name=resource_group.name, sku=databricks.SkuArgs( name="premium", tier="premium" ), encryption=databricks.WorkspacePropertiesEncryptionArgs( entities=databricks.EncryptionEntitiesDefinitionArgs( managed_disk=databricks.ManagedDiskEncryptionArgs( key_source="Microsoft.Keyvault", rotation_to_latest_key_version_enabled=False, key_vault_properties=databricks.ManagedDiskEncryptionKeyVaultPropertiesArgs( key_name="foobar", key_version="", key_vault_uri="https://foobar.vault.azure.net/" ), ), managed_services=databricks.EncryptionV2Args( key_source="Microsoft.Keyvault", key_vault_properties=databricks.EncryptionV2KeyVaultPropertiesArgs( key_name="foobar", key_version="", key_vault_uri="https://foobar.vault.azure.net/" ), ), ), ), parameters=databricks.WorkspaceCustomParametersArgs( encryption=databricks.WorkspaceEncryptionParameterArgs( value=databricks.EncryptionArgs( key_name="foobar", key_source="Microsoft.Keyvault", key_vault_uri="https://foobar.vault.azure.net/", key_version="" ), ), prepare_encryption=databricks.WorkspaceCustomBooleanParameterArgs( value=True, ), ), )

5. `pulumi up`

### Output of `pulumi about`

CLI Version 3.76.0 Go Version go1.20.6 Go Compiler gc

Plugins NAME VERSION azure-native 2.1.1 python unknown

Host OS ubuntu Version 22.04 Arch x86_64

This project is written in python: executable='/usr/bin/python3' version='3.10.6 '

Current Stack: organization/test-stack/dev

Found no resources associated with dev

Found no pending operations associated with dev

Backend Name URL file://. User Organizations

Dependencies: NAME VERSION pip 23.2.1 pulumi-azure-native 2.1.1 setuptools 68.0.0 wheel 0.41.0

Pulumi locates its logs in /tmp by default



### Additional context

ARM and Terraform providers deal with this differently. When using TF based Classic or ARM, they expect you to already provide the `AzureDatabricks` predefined principal `KeyVault Crypto User` to the KeyVault, and it works. It would be really nice, if the same approach is taken over by the Azure Native v2 provider too, because right now, Azure Blocks the call with an error that you have to first "prepare" the Workspace, add the Role Assignment and only then can you Encrypt the workspace.

### Contributing

Vote on this issue by adding a 👍 reaction. 
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already). 
danielrbradley commented 1 year ago

Hi @ahmed-wajid thanks for the detailed writeup.

Yes, I think you've found another interesting example of where Azure's created APIs which have inadvertantly created circular dependencies.

Here's a blog article which discussed addressing these challenges in Pulumi: https://www.pulumi.com/blog/exploring-circular-dependencies/

I think this is solved in the bridged provider by encapsulating multiple steps for the creation of the single resource - so one option could be to special-case how the provider creates this resource if this property is set, but that would be far from ideal solution to maintain. Creating "patch resources" is the more likely way we'll end up going in the futures to solve this - allowing a second operation to be performed on the same resource.

If you want to avoid the problem of modifying your code between the first and second run, you can use the workaround from the beginning of the article where we can reference our own stack's previous state to see if it needs to have the initial or encrypted configuration set.

If you want to avoid a second run of pulumi up then you could do the modification using some custom code within a dynamic provider to modify the Workspace. If using this approach, you'll need to ensure you perform a refresh before the next deployment to ensure that the state is updated to match the actual resource state, and will want to use the self-stack reference above to modify your program so it matches the new state on subsequent deployments.

Hope that gives you some options for working around this issue.

ahmed-wajid commented 1 year ago

Thank you for the response @danielrbradley. Meanwhile, we will use the bridged provider, but as I mentioned, the bridged provider and also when you deploy with ARM or with the Portal, they are using the AzureDatabricks (ObjectID: b43a9385-67b9-482b-9a57-d1727a3c779d) which is available in the Tenant, and can be easily given the role. Could we check with the Azure Team, and the Native Provider Team to also do this so that we do not have the circular dependency anymore?

danielrbradley commented 1 year ago

Ah I missed your addition information on first read. To make sure I've understood correctly, are you saying that in ARM you:

  1. Create the keyvault in the bridged provider, you give permissions to the AzureDatabricks object - allowing any Databricks workspace to use the keys.
  2. Create the workspace with encryption immediately.

If this is possible in an ARM template, it should be possible in this provider too. If you've got any examples of the ARM template approach that'd be a great starting point for digging into providing a workaround.

ahmed-wajid commented 1 year ago

Thanks for the quick reply.

  1. We create the keyvault using native provider, and also assign the role assignment to the AzureDatabricksobject.
  2. Now, we can use the bridged provider to create Databricks in one go. Also, with this ARM template (mind you, the ARM template is an export from my portal deployment, I don't really want to setup something to deploy an ARM template to prove it)
    {
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "location": {
            "type": "String"
        },
        "workspaceName": {
            "type": "String"
        },
        "tier": {
            "defaultValue": "premium",
            "type": "String"
        },
        "tagValues": {
            "type": "Object"
        },
        "managedResourceGroupName": {
            "defaultValue": "",
            "type": "String"
        },
        "enableNoPublicIp": {
            "type": "Bool"
        },
        "requireInfrastructureEncryption": {
            "type": "Bool"
        }
    },
    "variables": {
        "managedResourceGroupName": "[if(not(empty(parameters('managedResourceGroupName'))), parameters('managedResourceGroupName'), concat('databricks-rg-', parameters('workspaceName'), '-', uniqueString(parameters('workspaceName'), resourceGroup().id)))]",
        "trimmedMRGName": "[substring(variables('managedResourceGroupName'), 0, min(length(variables('managedResourceGroupName')), 90))]",
        "managedResourceGroupId": "[concat(subscription().id, '/resourceGroups/', variables('trimmedMRGName'))]"
    },
    "resources": [
        {
            "type": "Microsoft.Databricks/workspaces",
            "apiVersion": "2023-02-01",
            "name": "[parameters('workspaceName')]",
            "location": "[parameters('location')]",
            "dependsOn": [],
            "tags": "[parameters('tagValues')]",
            "sku": {
                "name": "[parameters('tier')]"
            },
            "properties": {
                "ManagedResourceGroupId": "[variables('managedResourceGroupId')]",
                "parameters": {
                    "enableNoPublicIp": {
                        "value": "[parameters('enableNoPublicIp')]"
                    },
                    "requireInfrastructureEncryption": {
                        "value": "[parameters('requireInfrastructureEncryption')]"
                    }
                },
                "encryption": {
                    "entities": {
                        "managedServices": {
                            "keySource": "Microsoft.Keyvault",
                            "keyVaultProperties": {
                                "keyVaultUri": "https://foobar.vault.azure.net",
                                "keyName": "foobar",
                                "keyVersion": "0000000000"
                            }
                        },
                        "managedDisk": {
                            "keySource": "Microsoft.Keyvault",
                            "keyVaultProperties": {
                                "keyVaultUri": "https://foobar.vault.azure.net",
                                "keyName": "foobar",
                                "keyVersion": "00000000000000"
                            },
                            "rotationToLatestKeyVersionEnabled": false
                        }
                    }
                }
            }
        }
    ]
    }

The problem with the Native Provider is, that it does not matter if there is already Role Assignments, but simply blocks with stderr: error: Code="InvalidEncryptionConfiguration" Message="Configure encryption for workspace at creation is not allowed, configure encryption once workspace is created and key vault access policies are added" because it always expects and enforces the two step approach. First: Create, then: Encrypt. This is what I want to change with this issue, is to find a way to tell the provider, the required Role assignments are already in place, use the AzureDatabricks object and encrypt it. And this is the same way both the Bridged Provider and ARM are dealing with it. I hope that I've explained it in a proper way, and provided enough details. Please do let me know if we need more details to dig deeper into this. Thank you!

danielrbradley commented 11 months ago

Hi @ahmed-wajid I've just taken some time to see if we can move this issue forward.

I initially spent some time to test using the approach with giving permission to the built-in Databricks Service Principal. I think the program at the bottom encapsulates what you were describing.

However, when I deploy this, the Workspace is unable to access the Key. A few issues I can see:

I'm not yet convinced that there is definitely a difference between what's possible with an ARM template vs in Pulumi, but it does appear that there's some magic happening in the UI driven Workspace creation.

Creating a general solution

I've switched my focus instead to continuing my work to find a general way to handle circular dependencies between any Pulumi resource. I've filed the following issue as a proposal to take to the core platform team:

Work In Progress program for using the service principal identity

import pulumi
from pulumi_azure_native import resources, databricks, authorization, keyvault

# Create a new resource group
resource_group = resources.ResourceGroup('resource_group')

tenant_id = authorization.get_client_config().tenant_id
subscription_id = authorization.get_client_config().subscription_id

# Create an Azure KeyVault with permission for the Databricks Service Principal
vault = keyvault.Vault("vault",
                       resource_group_name=resource_group.name,
                       properties=keyvault.VaultPropertiesArgs(
                           sku=keyvault.SkuArgs(
                               name="standard",
                               family="A"
                           ),
                           tenant_id=tenant_id,
                           enabled_for_deployment=True,
                           enabled_for_disk_encryption=True,
                           enabled_for_template_deployment=True,
                           access_policies=[
                               keyvault.AccessPolicyEntryArgs(
                                   # Databricks Service Principal
                                   object_id="b43a9385-67b9-482b-9a57-d1727a3c779d",
                                   tenant_id=tenant_id,
                                   permissions=keyvault.PermissionsArgs(
                                       keys=[
                                           "encrypt",
                                           "decrypt",
                                           "wrapKey",
                                           "unwrapKey",
                                           "sign",
                                           "verify",
                                           "get",
                                           "list",
                                           "create",
                                           "update",
                                           "import",
                                           "delete",
                                           "backup",
                                           "restore",
                                           "recover",
                                           "purge",
                                       ],
                                       secrets=[
                                           "get",
                                           "list",
                                           "set",
                                           "delete",
                                           "backup",
                                           "restore",
                                           "recover",
                                           "purge",
                                       ],
                                   ),
                               ),
                           ],
                       ))

# Create a Key in the KeyVault
key = keyvault.Key("databricks_key",
                   key_name="mykey",
                   vault_name=vault.name,
                   resource_group_name=resource_group.name,
                   properties={"keySize": 2048, "kty": "RSA"},
                   )

key_vault_uri = vault.name.apply(
    lambda name: f"https://{name}.vault.azure.net/")

key_version = pulumi.Output.all(key.key_uri, key.key_uri_with_version).apply(
    lambda inputs: inputs[1][len(inputs[0])+1:])

# Create a Databricks Workspace with encryption enabled
workspace = databricks.Workspace(
    "foobar",
    managed_resource_group_id=f"/subscriptions/{subscription_id}/resourceGroups/workspace_managed_rg",
    resource_group_name=resource_group.name,
    sku=databricks.SkuArgs(
        name="premium",
        tier="premium"
    ),
    encryption=databricks.WorkspacePropertiesEncryptionArgs(
        entities=databricks.EncryptionEntitiesDefinitionArgs(
            managed_disk=databricks.ManagedDiskEncryptionArgs(
                key_source="Microsoft.Keyvault",
                rotation_to_latest_key_version_enabled=False,
                key_vault_properties=databricks.ManagedDiskEncryptionKeyVaultPropertiesArgs(
                    key_name=key.name,
                    key_version=key_version,
                    key_vault_uri=key_vault_uri,
                ),
            ),
            managed_services=databricks.EncryptionV2Args(
                key_source="Microsoft.Keyvault",
                key_vault_properties=databricks.EncryptionV2KeyVaultPropertiesArgs(
                    key_name=key.name,
                    key_version=key_version,
                    key_vault_uri=key_vault_uri,
                ),
            ),
        ),
    ),
    parameters=databricks.WorkspaceCustomParametersArgs(
        enable_no_public_ip=databricks.WorkspaceCustomBooleanParameterArgs(
            value=True,
        ),
        require_infrastructure_encryption=databricks.WorkspaceCustomBooleanParameterArgs(
            value=True,
        ),
    ),
)
ahmed-wajid commented 11 months ago

@danielrbradley Thank you for your inputs on this issue.

Meanwhile, we have managed to find a way to do the following using automation api: