hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.55k stars 4.62k forks source link

azurerm_kubernetes_cluster | network_profile | load_balancer_sku - sets SKU to standard. Azure complains it is not "Standard" and hence treats it incorrectly #25616

Closed SA-accesso closed 5 months ago

SA-accesso commented 5 months ago

Is there an existing issue for this?

Community Note

Terraform Version

1.6.1

AzureRM Provider Version

3.99.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "aks" {
  name                             = var.name
  location                         = var.location
  resource_group_name              = var.resource_group_name
  dns_prefix                       = "${var.environment}-aks"
  kubernetes_version               = var.kubernetes_version
  node_resource_group              = "${var.environment}-aksresources-rg"
  private_cluster_enabled          = var.private_cluster_enabled
  local_account_disabled           = var.local_account_disabled
  http_application_routing_enabled = var.http_application_routing_enabled
  sku_tier                         = var.sku_tier
  azure_policy_enabled             = var.azure_policy_enabled
  # automatic_channel_upgrade        = var.automatic_channel_upgrade

  api_server_access_profile {
    authorized_ip_ranges = var.authorized_ip_ranges
  }

  default_node_pool {
    name                  = var.default_pool.name
    vm_size               = var.default_pool.vm_size
    orchestrator_version  = var.kubernetes_version
    type                  = var.default_pool.type
    os_sku                = var.default_pool.os_sku
    zones                 = var.default_pool.zones
    min_count             = var.default_pool.min_count
    max_count             = var.default_pool.max_count
    os_disk_size_gb       = var.default_pool.os_disk_size_gb
    os_disk_type          = var.default_pool.os_disk_type
    enable_auto_scaling   = var.default_pool.enable_auto_scaling
    max_pods              = var.default_pool.max_pods
    node_labels           = var.default_pool.node_labels

    vnet_subnet_id         = var.subnet.id
    enable_node_public_ip  = var.default_pool.enable_node_public_ip
    enable_host_encryption = var.default_pool.enable_host_encryption

    tags = var.default_pool.tags
  }

  network_profile {
    network_plugin    = "azure"
    network_policy    = "azure"
    load_balancer_sku = "standard"

    load_balancer_profile {
      outbound_ip_address_ids = [var.azurerm_public_ip_id]
    }
  }

Debug Output/Panic Output

+ network_profile {
          + dns_service_ip     = (known after apply)
          + docker_bridge_cidr = (known after apply)
          + ip_versions        = (known after apply)
          + load_balancer_sku  = "standard"
          + network_mode       = (known after apply)
          + network_plugin     = "azure"
          + network_policy     = "azure"
          + outbound_type      = "loadBalancer"
          + pod_cidr           = (known after apply)
          + pod_cidrs          = (known after apply)
          + service_cidr       = (known after apply)
          + service_cidrs      = (known after apply)

Expected Behaviour

We should be allowed to edit authorized IP ranges.

Actual Behaviour

image

You can see Azure is misinterpreting the SKU.

Steps to Reproduce

Apply a generic AKS cluster with the standard SKU.

Same result whether the below is specified or not (as terraform defaults to standard)

load_balancer_sku = "standard"

Important Factoids

No response

References

No response

ms-henglu commented 5 months ago

Hi @SA-accesso ,

Thank you for taking time to report this issue.

Here's a complete example of how to configure the authorized ip ranges by Terraform:


resource "azurerm_resource_group" "test" {
  name     = "acctestRG-akshenglu"
  location = "westeurope"
}

resource "azurerm_virtual_network" "test" {
  name                = "acctestvirtnethenglu"
  address_space       = ["10.1.0.0/16"]
  location            = azurerm_resource_group.test.location
  resource_group_name = azurerm_resource_group.test.name
}

resource "azurerm_subnet" "test" {
  name                 = "acctestsubnethenglu"
  resource_group_name  = azurerm_resource_group.test.name
  virtual_network_name = azurerm_virtual_network.test.name
  address_prefixes     = ["10.1.0.0/24"]
}

resource "azurerm_kubernetes_cluster" "test" {
  name                = "acctestakshenglu"
  location            = azurerm_resource_group.test.location
  resource_group_name = azurerm_resource_group.test.name
  dns_prefix          = "acctestakshenglu"

  default_node_pool {
    name           = "default"
    node_count     = 1
    vm_size        = "Standard_DS2_v2"
    vnet_subnet_id = azurerm_subnet.test.id
    upgrade_settings {
      max_surge = "10%"
    }
  }

  identity {
    type = "SystemAssigned"
  }

  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "standard"
  }

  api_server_access_profile {
    authorized_ip_ranges = [
      "8.8.8.8/32",
      "8.8.4.4/32",
      "8.8.2.0/24",
    ]
  }
}

And I tried to reproduce the issue that you met, but I could still see the edit button with load_balancer_sku = "standard"

image
SA-accesso commented 5 months ago

Sorry @ms-henglu , I only included a snippet.

resource "azurerm_kubernetes_cluster" "aks" {
  name                             = var.name
  location                         = var.location
  resource_group_name              = var.resource_group_name
  dns_prefix                       = "${var.environment}-aks"
  kubernetes_version               = var.kubernetes_version
  node_resource_group              = "${var.environment}-aksresources-rg"
  private_cluster_enabled          = var.private_cluster_enabled
  local_account_disabled           = var.local_account_disabled
  http_application_routing_enabled = var.http_application_routing_enabled
  sku_tier                         = var.sku_tier
  azure_policy_enabled             = var.azure_policy_enabled
  # automatic_channel_upgrade        = var.automatic_channel_upgrade

  api_server_access_profile {
    authorized_ip_ranges = var.authorized_ip_ranges
  }

  default_node_pool {
    name                  = var.default_pool.name
    vm_size               = var.default_pool.vm_size
    orchestrator_version  = var.kubernetes_version
    type                  = var.default_pool.type
    os_sku                = var.default_pool.os_sku
    zones                 = var.default_pool.zones
    min_count             = var.default_pool.min_count
    max_count             = var.default_pool.max_count
    os_disk_size_gb       = var.default_pool.os_disk_size_gb
    os_disk_type          = var.default_pool.os_disk_type
    enable_auto_scaling   = var.default_pool.enable_auto_scaling
    max_pods              = var.default_pool.max_pods
    node_labels           = var.default_pool.node_labels

    vnet_subnet_id         = var.subnet.id
    enable_node_public_ip  = var.default_pool.enable_node_public_ip
    enable_host_encryption = var.default_pool.enable_host_encryption

    upgrade_settings {
      max_surge = "33%"
    }

    tags = var.default_pool.tags
  }

  azure_active_directory_role_based_access_control {
    managed                = true
    azure_rbac_enabled     = true
    admin_group_object_ids = [data.azuread_group.aksadmins.id]
  }

  network_profile {
    network_plugin    = "azure"
    network_policy    = "azure"
    load_balancer_sku = "standard"

    load_balancer_profile {
      outbound_ip_address_ids = [var.azurerm_public_ip_id]
    }
  }

  identity {
    type = "SystemAssigned"
  }

  auto_scaler_profile {
    balance_similar_node_groups      = false
    expander                         = "least-waste" # default random
    max_graceful_termination_sec     = 600
    max_node_provisioning_time       = "10m" #default 15m
    max_unready_nodes                = 3
    max_unready_percentage           = 45
    new_pod_scale_up_delay           = "0s"
    scale_down_delay_after_add       = "10m"
    scale_down_delay_after_delete    = "10s"
    scale_down_delay_after_failure   = "3m"
    scan_interval                    = "10s"
    scale_down_unneeded              = "10m"
    scale_down_unready               = "20m"
    scale_down_utilization_threshold = 0.65 # default 0.5
    empty_bulk_delete_max            = 10
    skip_nodes_with_local_storage    = false
    skip_nodes_with_system_pods      = false
  }

  oms_agent {
    log_analytics_workspace_id      = var.log_analytics_workspace_id
    msi_auth_for_monitoring_enabled = true
  }

  microsoft_defender {
    log_analytics_workspace_id = var.log_analytics_workspace_id
  }

  lifecycle {
    ignore_changes = [api_server_access_profile]
  }

  tags = var.tags
}

We only started running into this issue in the last few weeks with no changes to our TF code. Do you have any idea what may be causing it?

shivam-sood89 commented 5 months ago

We are having similar issue. I feel this is more of a frontend azure portal issue than backend issue.

SA-accesso commented 5 months ago

We are having similar issue. I feel this is more of a frontend azure portal issue than backend issue.

Agreed, we're still facing the issue. The workaround we have currently is to run a az cli command to adjust the whitelisting which is less than ideal:

az aks update --resource-group --name --api-server-authorized-ip-ranges=""

shivam-sood89 commented 5 months ago

I think azure has resolved this at their end. The fix should be available in all regions by 24th april

SA-accesso commented 5 months ago

Seems to be resolved now by Azure. Closing

github-actions[bot] commented 4 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.