pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
129 stars 35 forks source link

Unable to create azure-native.machinelearningservices.ServerlessEndpoint resource #3687

Closed patrickstuddard closed 1 week ago

patrickstuddard commented 2 weeks ago

What happened?

I was trying to create a ServerlessEndpoint with Python code like this:

serverless_endpoint=azure.machinelearningservices.ServerlessEndpoint(
    "serverlessEndpoint",
    resource_group_name=resource_group.name,
    workspace_name=ml_project.name,
    serverless_endpoint_properties=azure.machinelearningservices.ServerlessEndpointArgs(
        offer=azure.machinelearningservices.ServerlessOfferArgs(
            offer_name="mistral-ai-large-2407-offer",
            publisher="000-000"
        ),
        auth_mode="AAD"
    ),
    sku = {
        "name": "Consumption"
    }
)

And I got this error back:

error: Code="UserError" Message="Serverless offer 'mistral-ai-large-2407-offer' not found in service namespace '000-000'." Details=[] AdditionalInfo=[{"info":{"value":"managementfrontend"},"type":"ComponentName"},{"info":{"value":{"operation":"c2d892c742bbb4d3af4b4b803733023f","request":"901eee4ec8280b5c"}},"type":"Correlation"},{"info":{"value":"westus"},"type":"Environment"},{"info":{"value":"westus"},"type":"Location"},{"info":{"value":"2024-10-29T19:53:35.2992082+00:00"},"type":"Time"},{"info":{"value":{"code":"NotFound","innerError":{"code":"ServerlessOfferNotFound","innerError":null}}},"type":"InnerError"}]

The valid values for offer_name and publisher don’t seem to be documented anywhere, but I tried all sorts of variations, and also with different models, and always got this error.

When I create the endpoint manually through the Azure portal and then import it into Pulumi, it gives me this warning:

warning: One or more imported inputs failed to validate. This is almost certainly a bug in theazure-nativeprovider. The import will still proceed, but you will need to edit the generated code after copying it into your program. warning: azure-native:machinelearningservices:ServerlessEndpoint resource 'string' has a problem: missing required property 'serverlessEndpointProperties'

And offer_name and publisher show up as empty strings.

Given the warning message above, and the fact that the parameters the Azure Native provider expects don’t seem to line up with the Azure API, which expects a model ID instead of offer and publisher (https://learn.microsoft.com/en-us/rest/api/azureml/serverless-endpoints/create-or-update?view=rest-azureml-2024-04-01&tabs=HTTP#serverlessendpoint) I’m wondering if this is a misleading error message. I suspect because the ServerlessEndpoint API is still preview that it may have changed since the azure-native provider implemented it.

I’m able to create the endpoint with an ARM template like so:

marketplace_subscription = azure.machinelearningservices.MarketplaceSubscription(
    "marketplaceSubscription",
    marketplace_subscription_properties = {
        "model_id": "azureml://registries/azureml-mistral/models/Mistral-large-2407"
    },
    name="marketplaceSubscription",
    resource_group_name=resource_group.name,
    workspace_name = ml_project.name
)

mistral_random_string = random.RandomString( "mistralRandom", length=6, special=False, numeric=False, upper=False )

serverless_endpoint_arm = {
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "resources": [
        {
            "type": "Microsoft.MachineLearningServices/workspaces/serverlessEndpoints",
            "apiVersion": "2024-07-01-preview",
            "name": f"[concat(parameters('mlProjectName'),'/Mistral-',parameters('randomStr'))]",
            "location": "[resourceGroup().location]",
            "sku": {
                "name": "Consumption"
            },
            "properties": {
                "authMode": "Key",
                "contentSafety": {
                    "contentSafetyStatus": "Disabled"
                },
                "modelSettings": {
                    "modelId": "azureml://registries/azureml-mistral/models/Mistral-large-2407"
                }
            }
        }
    ],
    "parameters": {
        "mlProjectName": {
            "type": "string",
            "metadata": {
                "description":"mlProjectName"
            }
        },
        "randomStr": {
            "type": "string",
            "metadata": {
                "descripton": "randomStr"
            }
        }
    }
}

serverless_endpoint=azure.resources.Deployment(
    "serverlessEndpoint",
    resource_group_name=resource_group.name,
    properties=azure.resources.DeploymentPropertiesArgs(
        mode="Incremental",
        template=serverless_endpoint_arm,
        parameters={
            "mlProjectName": {
                "value": ml_project.name
            },
            "randomStr": {
                "value": mistral_random_string.result
            }
        }
    ),
    opts = pulumi.ResourceOptions(depends_on=[marketplace_subscription])
)

But then I have to manually delete the endpoint before doing pulumi destroy, which is not ideal.

Example

Here’s a __main__.py you can run to reproduce the error:

import pulumi
import pulumi_azure_native as azure

# Import the program's configuration settings.
config = pulumi.Config()
tenant_id = config.require( "tenantId" )
subscription_id = config.require( "subscriptionId" )

# Create a resource group for the website.
resource_group = azure.resources.ResourceGroup("resource-group")

# Create a blob storage account.
account = azure.storage.StorageAccount(
    "account",
    resource_group_name=resource_group.name,
    kind=azure.storage.Kind.STORAGE_V2,
    sku={
        "name": azure.storage.SkuName.STANDARD_LRS,
    },
)

key_vault = azure.keyvault.Vault(
    "keyVault",
    resource_group_name=resource_group.name,
    properties=azure.keyvault.VaultPropertiesArgs(
        sku=azure.keyvault.SkuArgs(
            family="A",
            name="standard"
        ),
        tenant_id=tenant_id,
        access_policies=[]
    )
)

log_analytics_workspace = azure.operationalinsights.Workspace(
    "logAnalyticsWorkspace",
    resource_group_name=resource_group.name,
    sku=azure.operationalinsights.WorkspaceSkuArgs(
        name="PerGB2018"
    ),
    retention_in_days=30
)

app_insights = azure.insights.Component(
    "appInsights",
    resource_group_name=resource_group.name,
    kind="web",
    ingestion_mode="LogAnalytics",
    workspace_resource_id=log_analytics_workspace.id,
)

ml_project = azure.machinelearningservices.Workspace(
    "mlWorkspace",
    resource_group_name=resource_group.name,
    sku=azure.machinelearningservices.SkuArgs(
        name="Basic"
    ),
    friendly_name="My AI Studio Project",
    description="This is a description",
    identity=azure.machinelearningservices.ManagedServiceIdentityArgs(
        type="SystemAssigned"
    ),
    storage_account=account.id,
    key_vault=key_vault.id,
    application_insights=app_insights.id
)

serverless_endpoint=azure.machinelearningservices.ServerlessEndpoint(
    "serverlessEndpoint",
    resource_group_name=resource_group.name,
    workspace_name=ml_project.name,
    serverless_endpoint_properties=azure.machinelearningservices.ServerlessEndpointArgs(
        offer=azure.machinelearningservices.ServerlessOfferArgs(
            offer_name="mistral-ai-large-2407-offer",
            publisher="000-000"
        ),
        auth_mode="Key"
    ),
    sku = {
        "name": "Consumption"
    }
)

Output of pulumi about

CLI          
Version      3.136.1
Go Version   go1.23.2
Go Compiler  gc

Plugins
KIND      NAME           VERSION
resource  azure-native   2.67.0
language  python         unknown
resource  random         4.16.7
resource  synced-folder  0.11.1

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

thomas11 commented 2 weeks ago

Hi @patrickstuddard, thank you for the detailed issue description! The mismatch of parameters probably comes from the fact the Pulumi provider uses API version 2023-08-01-preview by default. You can use the other API versions of this service by explicitly importing them, though. The available versions are 2024-01-01-preview, 2024-04-01, 2024-04-01-preview, 2024-07-01-preview, 2024-10-01, 2024-10-01-preview. You can use them like this:

azure.machinelearningservices.v20241001.ServerlessEndpoint

If that doesn't help, we'll dig deeper.

patrickstuddard commented 2 weeks ago

Thanks @thomas11 ! That sounds like it might do the trick. I’m on the road the next few days, but I’ll give it a try when I get back and let you know.

patrickstuddard commented 1 week ago

Hi, @thomas11 ! After upgrading to the latest version of the Azure Native provider, this code worked:

marketplace_subscription = azure.machinelearningservices.MarketplaceSubscription(
    "marketplaceSubscription",
    marketplace_subscription_properties = {
        "model_id": "azureml://registries/azureml-mistral/models/Mistral-large-2407"
    },
    name="marketplaceSubscription",
    resource_group_name=resource_group.name,
    workspace_name = ml_project.name
)

serverless_endpoint=azure.machinelearningservices.v20241001.ServerlessEndpoint(
    "serverlessEndpoint",
    resource_group_name=resource_group.name,
    workspace_name=ml_project.name,
    serverless_endpoint_properties={
        "authMode": "Key",
        "contentSafety": {
            "contentSafetyStatus": "Disabled"
        },
        "modelSettings": {
            "modelId": "azureml://registries/azureml-mistral/models/Mistral-large-2407"
        }
    },
    sku={
        "name": "Consumption"
    },
    opts=pulumi.ResourceOptions(depends_on=[marketplace_subscription])
)

A couple of observations:

But now I’ve got it to where the setup and teardown are fully automated, so my issue is resolved.

Thank you so much for your help!

thomas11 commented 1 week ago

Great that it's working now!

Unfortunately, the dependency on Marketplace is not modeled in the Azure spec. I've added some docs to explain what you found: #3704.

The serverlessEndpoint endpoint is not randomly named because it belongs to a workspace, which is already randomly named. But the code doesn't know that the workspace won't be part of the URL. We could fix this with a custom override if it's a problem.

patrickstuddard commented 1 week ago

That makes sense then, thanks for clarifying, @thomas11 !