hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.6k stars 4.64k forks source link

azurerm_kubernetes_cluster - workload_autoscaler_profile "keda" could not be (un)set #22360

Open matthiasritter opened 1 year ago

matthiasritter commented 1 year ago

Is there an existing issue for this?

Community Note

We have used the feature directly, unfortunately we could not set all settings there, so we removed the block "workload_autoscaler_profile". But if keda was enabled on a cluster, it has to be set to "false" or true". If we remove the block, it will be set to null, but on Azure-Side it is magically set back to "false". So on the next run, terraform will set it back to null and Azure to false and so on. On new AKS-Clusters it is the same way, but with the "false"-Value. If keda wasn't enabled and we set it via terraform to false, it will be set magically to "null"(from Azure). On the next terraform-run it will be set again to "false" with our terraform-run and azure (again) will set it to "null".

Terraform Version

1.5.2

AzureRM Provider Version

3.63.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "aks" {
  name                    = "aks-${random_id.instance_id.hex}"
  location                = var.az_region
  kubernetes_version      = data.azurerm_kubernetes_service_versions.current.latest_version
  resource_group_name     = azurerm_resource_group.aks.name
  node_resource_group     = "aks-${random_id.instance_id.hex}-${var.customer}-${var.customer_stage}_node"
  dns_prefix              = "aks-${random_id.instance_id.hex}"
  private_cluster_enabled = true
  sku_tier                = var.uptime_sla

  automatic_channel_upgrade = "node-image"

  oidc_issuer_enabled       = var.workload_identity_enabled
  workload_identity_enabled = var.workload_identity_enabled

  maintenance_window {
    allowed {
      day   = "Monday"
      hours = ["0", "1", "2", "3", "4", "5"]
    }
    allowed {
      day   = "Tuesday"
      hours = ["0", "1", "2", "3", "4", "5"]
    }
    allowed {
      day   = "Wednesday"
      hours = ["0", "1", "2", "3", "4", "5"]
    }
    allowed {
      day   = "Thursday"
      hours = ["0", "1", "2", "3", "4", "5"]
    }
    allowed {
      day   = "Friday"
      hours = ["0", "1", "2", "3", "4", "5"]
    }
  }

  local_account_disabled = false

  role_based_access_control_enabled = true

  azure_active_directory_role_based_access_control {
    managed                = true
    admin_group_object_ids = ["1234"]
  }

  default_node_pool {
    name                         = "snplx"
    temporary_name_for_rotation  = "sysu"
    node_count                   = var.system_nodepool_node_count
    vm_size                      = var.system_nodepool_node_size
    vnet_subnet_id               = azurerm_subnet.aks.id
    max_pods                     = 50
    orchestrator_version         = data.azurerm_kubernetes_service_versions.current.latest_version
    enable_auto_scaling          = true
    min_count                    = var.system_nodepool_node_count_min
    max_count                    = var.system_nodepool_node_count_max
    zones                        = ["1", "2", "3"]
    only_critical_addons_enabled = true
    enable_host_encryption       = true
    upgrade_settings {
      max_surge = "1"
    }
  }

  identity {
    type = "SystemAssigned"
  }

  network_profile {
    load_balancer_sku  = "standard"
    dns_service_ip     = "100.64.0.10"
    network_plugin     = "azure"
    network_policy     = "azure"
    outbound_type      = "loadBalancer"
    service_cidr       = "100.64.0.0/12"
    docker_bridge_cidr = "100.96.0.1/24"
  }

  http_application_routing_enabled = false

  azure_policy_enabled = var.azure_policy_enabled

  dynamic "microsoft_defender" {
    for_each = var.microsoft_defender_enabled == true ? [1] : []

    content {
      log_analytics_workspace_id = var.sentinel_log_analytics_workspace_id
    }
  }

  storage_profile {
    blob_driver_enabled = var.blob_driver_enabled
  }

  lifecycle {
    prevent_destroy = true
  }
}

Debug Output/Panic Output

# azurerm_kubernetes_cluster.aks will be updated in-place
  ~ resource "azurerm_kubernetes_cluster" "aks" {
        id                                  = "/subscriptions/1234/resourceGroups/aks-1234/providers/Microsoft.ContainerService/managedClusters/cluster1234"
        name                                = "cluster1234"
        # (32 unchanged attributes hidden)

      - workload_autoscaler_profile {
          - keda_enabled                    = false -> null
          - vertical_pod_autoscaler_enabled = false -> null
        }

Expected Behaviour

terraform should not change the state.

Actual Behaviour

No response

Steps to Reproduce

No response

Important Factoids

No response

References

No response

SOFSPEEL commented 1 year ago

I think I found the magic sauce to make this problem go away:

https://learn.microsoft.com/en-us/azure/aks/vertical-pod-autoscaler#:~:text=about%20the%20cluster.-,Optionally,-%2C%20to%20disable%20VPA

  workload_autoscaler_profile {
    keda_enabled                    = false 
    vertical_pod_autoscaler_enabled = false 
  }
TheKangaroo commented 11 months ago

(coworker of @matthiasritter here 👋) We came back to this because we also needed vpa, and we found the reason for this behaviour and a workaround.

This is because the Azure API returns

workloadAutoScalerProfile": {}

if keda or vpa has never been enabled on this cluster before. But once you enable one of the features and disable it later, the Azure API returns

workloadAutoScalerProfile": {
  "keda": {
    "enabled": false
  }
}

instead of an empty block.

If you omit the workload_autoscaler_profile on a cluster that previously had an autoscaler enabled, the Azure API state will be "enabled": false, but terraform will try to set it to null (leaving it in the same state).

On the other hand, if you set it to false in your code on a cluster that has never had an autoscaler enabled before, the Azure API state will be empty, but terraform will try to set it to false (which leaves it empty in the Azure API).

You can work around this with the following code:

resource "azurerm_kubernetes_cluster" "aks" {
  [...]
  dynamic "workload_autoscaler_profile" {
    for_each = var.vpa_enable != null || var.keda_enable != null ? [1] : []
    content {
      keda_enabled                    = var.keda_enable
      vertical_pod_autoscaler_enabled = var.vpa_enable
    }
  }
  [...]
}

variable "keda_enable" {
  type    = bool
  default = null
}
variable "vpa_enable" {
  type    = bool
  default = null
}

(both variables default to null).

With this code you can

Hope this helps.