Open jaxxstorm opened 1 year ago
Hey @jaxxstorm ! Thanks for opening this - is this reproducible in other languages as well, or is it only a Python issue?
In the line gpu_instance_profile="None"
, the value is simply the string "None" because it's quoted. I'm surprised Azure didn't respond with a 400 here since gpu_instance_profile
is an enum with defined values.
Unless the example snippet was manually edited and None
was quoted erroneously then.
I tried to repro but there's no auto-approvable quota for multi-GPU VM SKUs. sigh Submitted a support request.
import * as resources from "@pulumi/azure-native/resources";
import * as containerservice from "@pulumi/azure-native/containerservice";
const resourceGroup = new resources.ResourceGroup("resourceGroup");
const managedCluster = new containerservice.ManagedCluster("managedCluster", {
addonProfiles: {},
agentPoolProfiles: [{
count: 3,
enableNodePublicIP: true,
mode: "System",
name: "nodepool1",
osType: "Linux",
type: "VirtualMachineScaleSets",
vmSize: "Standard_DS2_v2",
}],
dnsPrefix: "dnsprefix1",
enableRBAC: true,
kubernetesVersion: "",
networkProfile: {
loadBalancerProfile: {
managedOutboundIPs: {
count: 2,
},
},
loadBalancerSku: "standard",
outboundType: "loadBalancer",
},
resourceGroupName: resourceGroup.name,
resourceName: "clustername1",
servicePrincipalProfile: {
clientId: REDACTED
secret: REDACTED
},
sku: {
name: "Base",
tier: "Free",
},
});
const pool = new containerservice.AgentPool("sa", {
agentPoolName: "agentpool1",
count: 1,
gpuInstanceProfile: "MIG1g",
orchestratorVersion: "",
osType: "Linux",
resourceGroupName: resourceGroup.name,
resourceName: managedCluster.name,
vmSize: "Standard_ND96asr_v4",
});
Starting to familiarise myself with this resource. Here's the facts I think are true.
It looks like the AgentPool is a nested resource of the managed cluster. You can create AgentPoolProfiles inline with the parent, but the AgentPool itself is a standalone resource, though shares all of the same properties as the profiles - with the addition of the agentPoolName
which forms part of the URL.
ARM resource docs: https://learn.microsoft.com/en-us/azure/templates/microsoft.containerservice/managedclusters/agentpools
Example VM Size supporting GPUs: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/ndasra100v4-series
Our default version is using 2023-04-01 but the latest ARM docs are using 2024-06-02-preview.
The specs don't indicate how to unset this field. The empty string is not valid and as far as I can remember ARM update operations don't require all properties, so a missing field is not treated like a removal.
I expect this field might be set to a default value server-side for VMs with GPU support, but is removed if the VM doesn't support GPUs. There's no indication of this behaviour in the spec.
It's also possible that this is a field which Azure treats as immutable once created, but hasn't documented via the spec. The API might just discard the field it can't update as we've observed this behaviour in other resources.
Next step will be retrying the program above.
What happened?
When creating a container service agent pool, modifying the
gpu_instance_profile
doesn't work for null valuesExpected Behavior
If you set the property to
None
or any other nullable value, it should modify the resourceSteps to reproduce
Create an agent pool with a
gpu_instance_profile
Then modify this to remove the GPU instance profile:
Output of
pulumi about
N/A
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).