pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
126 stars 34 forks source link

Unable to set property to null on AgentPool #2462

Open jaxxstorm opened 1 year ago

jaxxstorm commented 1 year ago

What happened?

When creating a container service agent pool, modifying the gpu_instance_profile doesn't work for null values

Expected Behavior

If you set the property to None or any other nullable value, it should modify the resource

Steps to reproduce

Create an agent pool with a gpu_instance_profile

containerservice.AgentPool(
  "example",
  name=pool_config.name,
  gpu_instance_profile="MIG1g"
)

Then modify this to remove the GPU instance profile:

containerservice.AgentPool(
  "example",
  name=pool_config.name,
  gpu_instance_profile="None"
)

Output of pulumi about

N/A

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

kpitzen commented 1 year ago

Hey @jaxxstorm ! Thanks for opening this - is this reproducible in other languages as well, or is it only a Python issue?

thomas11 commented 9 months ago

In the line gpu_instance_profile="None", the value is simply the string "None" because it's quoted. I'm surprised Azure didn't respond with a 400 here since gpu_instance_profile is an enum with defined values.

Unless the example snippet was manually edited and None was quoted erroneously then.

thomas11 commented 9 months ago

I tried to repro but there's no auto-approvable quota for multi-GPU VM SKUs. sigh Submitted a support request.

import * as resources from "@pulumi/azure-native/resources";
import * as containerservice from "@pulumi/azure-native/containerservice";

const resourceGroup = new resources.ResourceGroup("resourceGroup");

const managedCluster = new containerservice.ManagedCluster("managedCluster", {
    addonProfiles: {},
    agentPoolProfiles: [{
        count: 3,
        enableNodePublicIP: true,
        mode: "System",
        name: "nodepool1",
        osType: "Linux",
        type: "VirtualMachineScaleSets",
        vmSize: "Standard_DS2_v2",
    }],
    dnsPrefix: "dnsprefix1",
    enableRBAC: true,
    kubernetesVersion: "",
    networkProfile: {
        loadBalancerProfile: {
            managedOutboundIPs: {
                count: 2,
            },
        },
        loadBalancerSku: "standard",
        outboundType: "loadBalancer",
    },
    resourceGroupName: resourceGroup.name,
    resourceName: "clustername1",
    servicePrincipalProfile: {
        clientId: REDACTED
        secret: REDACTED
    },
    sku: {
        name: "Base",
        tier: "Free",
    },
});

const pool = new containerservice.AgentPool("sa", {
    agentPoolName: "agentpool1",
    count: 1,
    gpuInstanceProfile: "MIG1g",
    orchestratorVersion: "",
    osType: "Linux",
    resourceGroupName: resourceGroup.name,
    resourceName: managedCluster.name,
    vmSize: "Standard_ND96asr_v4",
});
danielrbradley commented 4 hours ago

Starting to familiarise myself with this resource. Here's the facts I think are true.

It looks like the AgentPool is a nested resource of the managed cluster. You can create AgentPoolProfiles inline with the parent, but the AgentPool itself is a standalone resource, though shares all of the same properties as the profiles - with the addition of the agentPoolName which forms part of the URL.

ARM resource docs: https://learn.microsoft.com/en-us/azure/templates/microsoft.containerservice/managedclusters/agentpools

Example VM Size supporting GPUs: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/ndasra100v4-series

Our default version is using 2023-04-01 but the latest ARM docs are using 2024-06-02-preview.

The specs don't indicate how to unset this field. The empty string is not valid and as far as I can remember ARM update operations don't require all properties, so a missing field is not treated like a removal.

I expect this field might be set to a default value server-side for VMs with GPU support, but is removed if the VM doesn't support GPUs. There's no indication of this behaviour in the spec.

It's also possible that this is a field which Azure treats as immutable once created, but hasn't documented via the spec. The API might just discard the field it can't update as we've observed this behaviour in other resources.

Next step will be retrying the program above.