pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
126 stars 33 forks source link

Unexpected response type on `dbforpostgresql.v20230302preview.Cluster` #3501

Open ruben-janssens opened 1 month ago

ruben-janssens commented 1 month ago

What happened?

When deploying a cosmos db cluster with an user assigned identity the cluster creates but returns an error:

raise AssertionError(f"Unexpected type. Expected 'list' got '{typ}'")
AssertionError: Unexpected type. Expected 'list' got 'typing.Mapping[str, pulumi_azure_native.dbforpostgresql.v20230302preview.outputs.UserAssignedIdentityResponse]'

Example

Deploy a Cluster with an user assigned identity

self.__cosmosdb_cluster = Cluster(
            "Cluster",
            cluster_name=self.__name,
            database_name=self.__database_name,
            identity=IdentityPropertiesArgs(
                type=IdentityType.USER_ASSIGNED,
                user_assigned_identities=[
                    self.__identity.id
                ]
            ),
            ...

Output of pulumi about

CLI          
Version      3.120.0
Go Version   go1.22.4
Go Compiler  gc

Host     
OS       ubuntu
Version  22.04
Arch     x86_64

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

thomas11 commented 1 month ago

Hi @ruben-janssens, could you tell us the version of the Azure Native provider you're using? Running pulumi about in the project directory where Pulumi.yaml is located should show it.

Also, can you show us more about where self.__identity.id comes from?

ruben-janssens commented 1 month ago

I am using pulumi_azure_native 2.53.0

self.__identity.id is a managed identity created previously. The identity is used to enable encryption at rest for the database. Keep in mind that this is API version v20230302preview

Here is a snippet of what self.__identity is:

self.__identity = UserAssignedIdentity(
            "Cluster",
            resource_name_=self.__name,
            resource_group_name=self.__resource_group_name
        )
thomas11 commented 1 month ago

Thank you! I see you're explicitly choosing API version 2023-03-02. Have you tried other API versions yet?

We'll have to dig a bit deeper on this one.

danielrbradley commented 1 month ago

It looks like the AssertionError is being raised by the Python SDK itself - and when populating the output objects after the operation should have completed. @ruben-janssens is this happening after the deployment of the resource completes or during preview or at some other point?

My initial hypothesis is that this is the intersection of a few of special cases of behaviour.

  1. The user_assigned_identities fields are a little unusual because the input is a list of strings (which are treated as a set) but the output is a map of the strings inputted mapping the values containing additional information about each identity. Most resource properties have the same shape of inputs and outputs so this is a little unusual.
  2. The Python SDK (and some others) will fill missing outputs with the value of the same named field from the inputs.
  3. The Python SDK proactively validates the data it's assigning to outputs and fails more agressively that the shape of data matches its expectations exactly.

I think what's happening here is that the output value for user_assigned_identities is not being returned for some reason and the SDK is filling in the value from the input which is a list, which subsequently fails validation because the shape of the input isn't valid for the output.

@ruben-janssens if you could also provide a full python program which is deployable and reproduces the issue, that would help speed up our investigations. Thank you!

ruben-janssens commented 1 month ago

Thank you! I see you're explicitly choosing API version 2023-03-02. Have you tried other API versions yet?

We'll have to dig a bit deeper on this one.

Sadly any other API version does not support the encryption at rest and it is a requirement for us.

@danielrbradley This error is thrown after the deployment of the resource. Below you can find a small snippet.

from pulumi_azure_native.resources import ResourceGroup
from pulumi_azure_native.managedidentity import UserAssignedIdentity
from pulumi_azure_native.dbforpostgresql.v20230302preview import (
    Cluster,
    IdentityPropertiesArgs,
    IdentityType,
    DataEncryptionArgs,
    DataEncryptionType,
    MaintenanceWindowArgs
)

resource_group = ResourceGroup(
      "Cluster",
      resource_group_name="cluster-testing"
  )

identity = UserAssignedIdentity(
    "Cluster",
    resource_name_="my-test-cluster",
    resource_group_name=resource_group.name
)

cosmosdb_cluster = Cluster(
    "Cluster",
    cluster_name="my-test-cluster",
    database_name="citus",
    identity=IdentityPropertiesArgs(
        type=IdentityType.USER_ASSIGNED,
        user_assigned_identities=[
            identity.id
        ]
    ),
    administrator_login_password="SuperSecurePassword1234",
    enable_ha=False,
    enable_geo_backup=False,
    postgresql_version="16",
    coordinator_v_cores=1,
    coordinator_storage_quota_in_mb=32768,
    coordinator_server_edition="BurstableMemoryOptimized",
    coordinator_enable_public_ip_access=True,
    enable_shards_on_coordinator=True,
    node_count=0,
    node_enable_public_ip_access=True,
    resource_group_name=resource_group.name
)

I think this should be sufficient...

danielrbradley commented 1 month ago

Thanks for the code @ruben-janssens - I can confirm the error occurs for v2.56.0 of the provider.

Using the GRPC debug logging option (PULUMI_DEBUG_GRPC=grpc.log) I can see that the response on this version of the API is indeed missing the identity property. This is likely a bug in this specific version of the API on Azure's side. The Python runtime then attempts to fill in the missing value with the input, which is of a different type. The error message also appears backwards because it sees the reflected type is a list and so tries to access the list element type and fails there, rather than validating the top-level type.

gRPC Request

    {
      "administratorLoginPassword": "SuperSecurePassword1234",
      "clusterName": "my-test-cluster",
      "coordinatorEnablePublicIpAccess": true,
      "coordinatorServerEdition": "BurstableMemoryOptimized",
      "coordinatorStorageQuotaInMb": 32768,
      "coordinatorVCores": 1,
      "databaseName": "citus",
      "enableGeoBackup": false,
      "enableHa": false,
      "enableShardsOnCoordinator": true,
      "identity": {
        "type": "UserAssigned",
        "userAssignedIdentities": [
          "/subscriptions/{subscriptionId}/resourcegroups/cluster-testing/providers/Microsoft.ManagedIdentity/userAssignedIdentities/my-test-cluster"
        ]
      },
      "nodeCount": 0,
      "nodeEnablePublicIpAccess": true,
      "postgresqlVersion": "16",
      "resourceGroupName": "cluster-testing"
    }

gRPC Response

    {
      "administratorLogin": "citus",
      "citusVersion": "12.1",
      "coordinatorEnablePublicIpAccess": true,
      "coordinatorServerEdition": "BurstableMemoryOptimized",
      "coordinatorStorageQuotaInMb": 32768,
      "coordinatorVCores": 1,
      "dataEncryption": {
        "type": "SystemManaged"
      },
      "databaseName": "citus",
      "enableGeoBackup": false,
      "enableHa": false,
      "enableShardsOnCoordinator": true,
      "id": "/subscriptions/{subscriptionId}/resourceGroups/cluster-testing/providers/Microsoft.DBforPostgreSQL/serverGroupsv2/my-test-cluster",
      "location": "uksouth",
      "maintenanceWindow": {
        "customWindow": "Disabled",
        "dayOfWeek": 0,
        "startHour": 0,
        "startMinute": 0
      },
      "name": "my-test-cluster",
      "nodeCount": 0,
      "nodeEnablePublicIpAccess": true,
      "nodeServerEdition": "MemoryOptimized",
      "nodeStorageQuotaInMb": 524288,
      "nodeVCores": 4,
      "postgresqlVersion": "16",
      "privateEndpointConnections": [],
      "provisioningState": "Succeeded",
      "readReplicas": [],
      "serverNames": [
        {
          "fullyQualifiedDomainName": "c-my-test-cluster.xxxxxxxxx.postgres.cosmos.azure.com",
          "name": "my-test-cluster-c"
        }
      ],
      "state": "Ready",
      "systemData": {
        "createdAt": "2024-08-19T09:58:45.9422327Z",
        "createdBy": "daniel@pulumi.com",
        "createdByType": "User",
        "lastModifiedAt": "2024-08-19T09:58:45.9422327Z",
        "lastModifiedBy": "daniel@pulumi.com",
        "lastModifiedByType": "User"
      },
      "tags": {},
      "type": "Microsoft.DBforPostgreSQL/serverGroupsv2"
    }

Solution

In order to work around this behaviour in the Python SDK I think we need to ensure that we alway provide a value for the ~"identity"~ "userAssignedIdentities" field to ensure that the input value is never used. Though I would also recommend raising this with Microsoft as expected information is missing from the GET request.

danielrbradley commented 1 month ago

Interestingly, this should have been fixed a while ago by https://github.com/pulumi/pulumi/pull/11422, but perhaps this has broken somehow.

danielrbradley commented 1 month ago

I've investigated a quick fix in the provider to create an empty set when the property is missing, but because it's nested within the "identity" property that's also missing it never recuses down to the point of the "userAssignedIdentities" property.

I'm following this up with the Python SDK side as I think it's probably a bug to error here.

Another workaround that might be possible is to initially create the cluster using a different version of the API, then use the command provider to set the encryption at rest via the AZ CLI.

thomas11 commented 1 month ago

I've investigated a quick fix in the provider to create an empty set when the property is missing, but because it's nested within the "identity" property that's also missing it never recuses down to the point of the "userAssignedIdentities" property.

Does the existing pkg/gen/propertyDefaults.go not match this use case?

danielrbradley commented 1 month ago

Does the existing pkg/gen/propertyDefaults.go not match this use case?

Unfortunately this only sets defaults on the inputs - but we're needing to provide a default output for which we have no existing mechanism. I'll escallate this as an issue on the Python SDK instead.