Open haitch opened 2 weeks ago
AKS is deprecating version 1.27, and offer through LTS program only : https://learn.microsoft.com/en-us/azure/aks/long-term-support
now customer with cluster on LTS cannot add new nodepool, it was blocked by terraform client side version validation. terraform-provider-azurerm/internal/services/containers/kubernetes_cluster_validate.go clusterControlPlaneMustBeUpgradedError
we are currently using the following provider config:
Terraform = 1.6.6
source = "hashicorp/azurerm"
version = "3.106.1"
╷
│ Error:
│ The Kubernetes/Orchestrator Version "1.27" is not available for Node Pool "blueuser".
│
│ Please confirm that this version is supported by the Kubernetes Cluster "a241281-p01-musea2-aks"
│ (Resource Group "a241281-p01-musea2-rg") - which may need to be upgraded first.
│
│ The Kubernetes Cluster is running version "1.27.16".
│
│ The supported Orchestrator Versions for this Node Pool/supported by this Kubernetes Cluster are:
│
│
│ Node Pools cannot use a version of Kubernetes that is not supported on the Control Plane. More
│ details can be found at https://aka.ms/version-skew-policy.
│
│
│ with module.aks.azurerm_kubernetes_cluster_node_pool.blue_pool,
│ on ../../../modules/azurerm_kubernetes_service/main.tf line 216, in resource "azurerm_kubernetes_cluster_node_pool" "blue_pool":
│ 216: resource "azurerm_kubernetes_cluster_node_pool" "blue_pool" {
│
Terraform Config
# create aks cluster
resource "azurerm_kubernetes_cluster" "this" {
name = module.tagging.kubernetes_service_id
resource_group_name = var.resource_group_name
location = var.location
node_resource_group = "${var.resource_group_name}-managed"
dns_prefix = var.dns_prefix
private_cluster_enabled = true
private_dns_zone_id = "None"
private_cluster_public_fqdn_enabled = true
azure_active_directory_role_based_access_control {
managed = true
tenant_id = data.azurerm_client_config.current.tenant_id
azure_rbac_enabled = true
}
azure_policy_enabled = true
default_node_pool {
name = "bluecrit"
vm_size = local.crit_node_pool_configs.vm_size
enable_auto_scaling = false
node_count = local.crit_node_pool_configs.node_count
max_pods = 110
only_critical_addons_enabled = true
os_disk_type = var.system_pool_disk_type
orchestrator_version = local.crit_node_pool_configs.kubernetes_version
# Required when using Azure CNI
vnet_subnet_id = var.node_pool_subnet_id
temporary_name_for_rotation = "bluecrittemp"
tags = module.aks_tagging.tags
zones = local.availability_zones
upgrade_settings {
max_surge = var.system_node_upgrade_max_surge
}
}
identity {
type = "UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.this.id
]
}
key_vault_secrets_provider {
secret_rotation_enabled = true
}
kubelet_identity {
client_id = azurerm_user_assigned_identity.this.client_id
object_id = azurerm_user_assigned_identity.this.principal_id
user_assigned_identity_id = azurerm_user_assigned_identity.this.id
}
kubernetes_version = var.kubernetes_version
# Reference https://learn.microsoft.com/en-us/azure/aks/managed-aad#disable-local-accounts
local_account_disabled = false
network_profile {
network_plugin = "azure"
network_policy = "azure"
network_plugin_mode = "overlay"
# http://aka.ms/aks/outboundtype
outbound_type = var.kubernetes_outbound_type
pod_cidr = "10.244.0.0/14"
service_cidr = "172.25.0.0/16"
dns_service_ip = "172.25.0.10"
}
# Required for workload identity
oidc_issuer_enabled = true
workload_autoscaler_profile {
keda_enabled = local.keda_enabled
}
# Container insights
dynamic "oms_agent" {
for_each = var.log_analytics_workspace_id != "" ? ["oms_agent"] : []
content {
log_analytics_workspace_id = var.log_analytics_workspace_id
msi_auth_for_monitoring_enabled = true
}
}
maintenance_window_auto_upgrade {
duration = var.aks_maintenance_window_auto_upgrade.duration
frequency = var.aks_maintenance_window_auto_upgrade.frequency
interval = var.aks_maintenance_window_auto_upgrade.interval
day_of_week = var.aks_maintenance_window_auto_upgrade.day_of_week
start_time = var.aks_maintenance_window_auto_upgrade.start_time
utc_offset = "+00:00"
}
maintenance_window_node_os {
frequency = var.aks_node_patch_window.frequency
interval = var.aks_node_patch_window.interval
duration = var.aks_node_patch_window.duration
day_of_week = var.aks_node_patch_window.day_of_week
start_time = var.aks_node_patch_window.start_time
utc_offset = "+00:00"
}
automatic_channel_upgrade = "patch"
node_os_channel_upgrade = "SecurityPatch"
workload_identity_enabled = true
support_plan = var.kubernetes_support_plan
sku_tier = var.kubernetes_sku_tier
tags = module.aks_tagging.tags
depends_on = [
azurerm_role_assignment.aks_to_itself,
azurerm_role_assignment.aks_network_contributor_subnet,
azurerm_role_assignment.aks_network_contributor_route_table,
]
lifecycle {
ignore_changes = [default_node_pool[0].orchestrator_version, kubernetes_version]
}
}
# create blue user node pool
resource "azurerm_kubernetes_cluster_node_pool" "blue_pool" {
name = "blueuser"
kubernetes_cluster_id = azurerm_kubernetes_cluster.this.id
vm_size = local.blue_node_pool_configs.vm_size
enable_auto_scaling = true
max_count = local.blue_node_pool_configs.max_count
min_count = local.blue_node_pool_configs.min_count
node_count = local.blue_node_pool_configs.node_count
max_pods = 110
mode = "User"
orchestrator_version = local.blue_node_pool_configs.kubernetes_version
os_disk_type = var.user_pool_disk_type
tags = module.aks_tagging.tags
zones = local.availability_zones
upgrade_settings {
max_surge = var.user_node_upgrade_max_surge
}
vnet_subnet_id = var.node_pool_subnet_id
lifecycle {
ignore_changes = [node_count, orchestrator_version]
}
}
kubernetes_support_plan = "AKSLongTermSupport" kubernetes_sku_tier = "Premium" orchestrator_version = "1.27" kubernetes_version = "1.27"
The azurerm provider uses the "availableAgentpoolVersions" API as a client-side validation, but it seems that this API failed to return the available versions.
az rest -m GET -u 'https://management.azure.com/subscriptions/****/resourceGroups/ac
ctestRG-aks-henglu/providers/Microsoft.ContainerService/managedClusters/acctestakhenglu/availableAgentPoolVersions?api-version=2024-05-01'
{
"id": "/subscriptions/*****/resourcegroups/acctestRG-aks-henglu/providers/Microsoft.ContainerService/managedClusters/acctestakhenglu/availableagentpoolversions",
"name": "default",
"properties": {
"agentPoolVersions": []
},
"type": "Microsoft.ContainerService/managedClusters/availableAgentpoolVersions"
}
Once this API is fixed, the Terraform azurerm provider will be unblocked.
And I have a workaround, which is using the azapi provider which is also a Terraform provider but without any client side validation. Here's an example to create a agent pool.
resource "azapi_resource" "agentPool" {
type = "Microsoft.ContainerService/managedClusters/agentPools@2024-05-01"
parent_id = azurerm_kubernetes_cluster.test.id
name = "internal"
body = {
properties = {
count = 1
mode = "User"
vmSize = "Standard_DS2_v2"
orchestratorVersion = "1.27"
}
}
}
@ms-henglu would you happen to have any ETA on this API fix? We have 96 clusters and we need to provide support for these existing clusters.. at this time I would not want to change my terraform templates to swap out AzureRM for AzAPI. For new clusters we are looking to jump to the new LTS version which is 1.30
Also would this API fix be included in a 3.x patch version or will it only be included in 4.x? Thanks for your help
Hi @will-iam-gm , the root cause is on the API side, there’s no action needed from the client side.
please confirm with @haitch about the fix on the API side.
Hi @will-iam-gm,
I'd like to share another workaround:
I disabled the version validation in the azurerm provider, and pushed the changes in my fork of azurerm provider. You could compile and use them locally.
Branches: Based v3.116.0: https://github.com/ms-henglu/terraform-provider-azurerm/tree/issue-27245-mitigation-v3.116.0 Based v4.0.1: https://github.com/ms-henglu/terraform-provider-azurerm/tree/issue-27245-mitigation-v4.0.1 If cx want to use other versions, they could cherry-pick this commit: https://github.com/hashicorp/terraform-provider-azurerm/compare/main...ms-henglu:terraform-provider-azurerm:issue-27245-mitigation-v3.116.0 How to use locally complied providers: https://github.com/hashicorp/terraform-provider-azurerm/blob/main/DEVELOPER.md#developer-using-the-locally-compiled-azure-provider-binary
But again, this is just a workaround and temp fix, and this fix will not be able to be included in the public release.
Thanks @ms-henglu
I will be going over our options with my team to see which one best fit our platform..
Is there anything tracking against the api-side of this? Is there an azure ticket or something?
@rovangju I asked Microsoft that question on the following post on Microsoft Learn, https://learn.microsoft.com/en-us/answers/questions/2028718/unable-to-deploy-aks-lts-1-27-in-multiple-regions. Here was the response.
I am actually from the API team, the API fix is indeed rolling out, but it will take some time. So the suggested solution is:
az cli
to add new nodepoolThanks for the follow up - it's unfortunate that this happened in such a manner, I have closed-loop production environments that are all under terraform control so trying to figure out if I can just sit tight or start pushing for out-of-band workaround.
@haitch at the same time we were debugging this issue with AKS 1.27 LTS we noted that 1.27 was removed as an option from eastus, eastus2 and then a few days later from westus2. AKS 1.27 LTS is still missing from these regions. We are getting by with westus and westus3 for now, but we are based on the east coast of the United States. Is all of this related somehow? Will 1.27 be returning to these regions?
e.g. $ az aks get-versions --location eastus2 --output table KubernetesVersion Upgrades
1.30.3 None available 1.30.2 1.30.3 1.30.1 1.30.2, 1.30.3 1.30.0 1.30.1, 1.30.2, 1.30.3 1.29.7 1.30.0, 1.30.1, 1.30.2, 1.30.3 1.29.6 1.29.7, 1.30.0, 1.30.1, 1.30.2, 1.30.3 1.29.5 1.29.6, 1.29.7, 1.30.0, 1.30.1, 1.30.2, 1.30.3 1.29.4 1.29.5, 1.29.6, 1.29.7, 1.30.0, 1.30.1, 1.30.2, 1.30.3 1.29.2 1.29.4, 1.29.5, 1.29.6, 1.29.7, 1.30.0, 1.30.1, 1.30.2, 1.30.3 1.29.0 1.29.2, 1.29.4, 1.29.5, 1.29.6, 1.29.7, 1.30.0, 1.30.1, 1.30.2, 1.30.3 1.28.12 1.29.0, 1.29.2, 1.29.4, 1.29.5, 1.29.6, 1.29.7 1.28.11 1.28.12, 1.29.0, 1.29.2, 1.29.4, 1.29.5, 1.29.6, 1.29.7 1.28.10 1.28.11, 1.28.12, 1.29.0, 1.29.2, 1.29.4, 1.29.5, 1.29.6, 1.29.7 1.28.9 1.28.10, 1.28.11, 1.28.12, 1.29.0, 1.29.2, 1.29.4, 1.29.5, 1.29.6, 1.29.7 1.28.5 1.28.9, 1.28.10, 1.28.11, 1.28.12, 1.29.0, 1.29.2, 1.29.4, 1.29.5, 1.29.6, 1.29.7 1.28.3 1.28.5, 1.28.9, 1.28.10, 1.28.11, 1.28.12, 1.29.0, 1.29.2, 1.29.4, 1.29.5, 1.29.6, 1.29.7 1.28.0 1.28.3, 1.28.5, 1.28.9, 1.28.10, 1.28.11, 1.28.12, 1.29.0, 1.29.2, 1.29.4, 1.29.5, 1.29.6, 1.29.7
@hahewlet Microsoft is depreciating 1.27 unless you are using LTS support plan, I dont think you can pick the version from the portal only through az cli or iac
@ms-henglu @haitch With the api rollout fix, today I was able to deploy a node pool using Terraform on version 1.27 with no changes to the provider. Thanks for your help here
@will-iam-gm your screenshot gave me the clue I needed. My output for eastus2 did not include 1.27 because my az cli was too old. Once I upgraded that, I can now see the 1.27 versions listed.
Is there an existing issue for this?
Community Note
Terraform Version
1.6.6
AzureRM Provider Version
3.106.1
Affected Resource(s)/Data Source(s)
azurerm_kubernetes_cluster
Terraform Configuration Files
Debug Output/Panic Output
Expected Behaviour
No response
Actual Behaviour
No response
Steps to Reproduce
No response
Important Factoids
No response
References
No response