hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.53k stars 4.6k forks source link

AKS enabling cilium support does not work due to not being able to set network_policy to cilium #23339

Closed derek-andrews-work closed 1 year ago

derek-andrews-work commented 1 year ago

Is there an existing issue for this?

Community Note

Terraform Version

v1.1.9

AzureRM Provider Version

3.73.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

# Module
## main.tf
data "azurerm_user_assigned_identity" "this" {
  name                = var.umi_name    # Required
  resource_group_name = var.umi_rg_name # Required
}

resource "azurerm_kubernetes_cluster" "this" {
  name                              = var.cluster_name                          # Required
  location                          = var.location                              # Required
  resource_group_name               = var.resource_group_name                   # Required
  sku_tier                          = var.sku_tier                              # Optional, defaults to Paid
  dns_prefix                        = var.cluster_name                          # Required
  kubernetes_version                = var.kubernetes_version                    # Required
  private_cluster_enabled           = var.private_cluster_enabled               # Optional, defaults to true
  private_dns_zone_id               = var.private_dns_zone_id
  public_network_access_enabled     = var.public_network_access_enabled         # Optional, defaults to false
  local_account_disabled            = var.local_account_disabled                # Optional, defaults to true
  automatic_channel_upgrade         = var.automatic_channel_upgrade             # Optional, defaults to patch
  oidc_issuer_enabled               = var.oidc_issuer_enabled                   # Optional, defaults to true
  workload_identity_enabled         = var.workload_identity_enabled             # Optional, defaults to true
  azure_policy_enabled              = var.azure_policy_enabled                  # Optional, defaults to true
  http_application_routing_enabled  = var.http_application_routing_enabled      # Optional, defaults to false
  role_based_access_control_enabled = var.role_based_access_control_enabled     # Optional, defaults to true
  disk_encryption_set_id            = var.disk_encryption_set_id                # Optional, defaults to null
  tags                              = var.tags                                  # Required

  maintenance_window {
    allowed {
      day   = var.maintenance_window_day   # Optional, defaults to Monday
      hours = var.maintenance_window_hours # Optional, defaults to 1,3
    }
  }

  oms_agent {
    log_analytics_workspace_id = var.log_analytics_workspace_id
  }

  key_vault_secrets_provider {
    secret_rotation_enabled  = var.secret_rotation_enabled # Optional, defaults to true
    secret_rotation_interval = var.secret_roation_interval # Optional, defaults to 2m
  }

  storage_profile {
    blob_driver_enabled         = var.blob_driver_enabled         # Optional, defaults to false
    disk_driver_enabled         = var.disk_driver_enabled         # Optional, defaults to true
    disk_driver_version         = var.disk_driver_version         # Optional, defaults to v1
    file_driver_enabled         = var.file_driver_enabled         # Optional, defaults to false
    snapshot_controller_enabled = var.snapshot_controller_enabled # Optional, defaults to true
  }

  network_profile {
    network_plugin     = var.network_plugin     # Optional, defaults to azure
    network_policy     = var.network_policy     # Optional, defaults to calico
    service_cidr       = var.service_cidr       # Required
    dns_service_ip     = var.dns_service_ip     # Required (Might be able to set this)
    docker_bridge_cidr = var.docker_bridge_cidr # Optional, defaults to "172.17.0.1/16"
    load_balancer_sku  = var.load_balancer_sku  # Optional, defaults to standard
    outbound_type      = var.outbound_type      # Optional, defaults to userDefinedRouting
    ebpf_data_plane    = var.ebpf_data_plane
  }

  default_node_pool {
    name                          = var.systempool_name                          # Required
    type                          = var.systempool_type                          # Optional, defaults to VirtualMachineScaleSets
    capacity_reservation_group_id = var.systempool_capacity_reservation_group_id # Optional, defaults to null
    node_labels                   = var.systempool_node_labels                   # Optional, defaults to null
    node_taints                   = var.systempool_node_taints                   # Optional, defaults to null
    vm_size                       = var.systempool_vm_size                       # Optional, defaults to Standard_DS4_v2
    vnet_subnet_id                = var.node_subnet_id       # Required but pulled from data block
    zones                         = var.systempool_availability_zones            # Optional, defaults to 1,2,3
    enable_auto_scaling           = var.systempool_enable_auto_scaling           # Optional, defaults to true
    max_count                     = var.systempool_max_count                     # Optional, defaults to 2
    min_count                     = var.systempool_min_count                     # Optional, defaults to 1
    os_disk_type                  = var.systempool_os_disk_type                  # Optional, defaults to Ephemeral
    os_disk_size_gb               = var.systempool_os_disk_size_gb               # Optional, defaults to 128
    max_pods                      = var.systempool_max_pods                      # Optional, defaults to 30
    enable_node_public_ip         = var.systempool_enable_node_public_ip         # Optional, defaults to false
    pod_subnet_id                 = var.pod_subnet_id       # Required but pulled from data block
    only_critical_addons_enabled  = var.systempool_only_critical_addons_enabled  # Optional, defaults to true
    tags                          = var.tags

    upgrade_settings {
      max_surge = var.max_surge
    }
  }

  auto_scaler_profile {
    balance_similar_node_groups      = var.balance_similar_node_groups      # Optional, defaults to true
    expander                         = var.expander                         # Optional, defaults to random
    max_graceful_termination_sec     = var.max_graceful_termination_sec     # Optional, defaults to 600
    max_node_provisioning_time       = var.max_node_provisioning_time       # Optional, defaults to 15m
    max_unready_nodes                = var.max_unready_nodes                # Optional, defaults to 3
    max_unready_percentage           = var.max_unready_percentage           # Optional, defaults to 45
    new_pod_scale_up_delay           = var.new_pod_scale_up_delay           # Optional, defaults to 10s
    scale_down_delay_after_add       = var.scale_down_delay_after_add       # Optional, defaults to 10m
    scale_down_delay_after_delete    = var.scale_down_delay_after_delete    # Optional, defaults to 10s
    scale_down_delay_after_failure   = var.scale_down_delay_after_failure   # Optional, defaults to 3m
    scan_interval                    = var.scan_interval                    # Optional, defaults to 10s
    scale_down_unneeded              = var.scale_down_unneeded              # Optional, defaults to 10m
    scale_down_unready               = var.scale_down_unready               # Optional, defaults to 20m
    scale_down_utilization_threshold = var.scale_down_utilization_threshold # Optional, defaults to 0.5
    empty_bulk_delete_max            = var.empty_bulk_delete_max            # Optional, defaults to 10
    skip_nodes_with_local_storage    = var.skip_nodes_with_local_storage    # Optional, defaults to false
    skip_nodes_with_system_pods      = var.skip_nodes_with_system_pods      # Optional, defaults to false
  }

  workload_autoscaler_profile {
    keda_enabled = var.keda_enabled
  }

  identity {
    type         = var.identity_type                                  # Optional, defaults to UserAssigned
    identity_ids = ["${data.azurerm_user_assigned_identity.this.id}"] # Required but pulled from data block
  }

  kubelet_identity {
    client_id                 = data.azurerm_user_assigned_identity.this.client_id    # Required but pulled from data block
    object_id                 = data.azurerm_user_assigned_identity.this.principal_id # Required but pulled from data block
    user_assigned_identity_id = data.azurerm_user_assigned_identity.this.id           # Required but pulled from data block
  }

  # key_management_service {
  #   key_vault_key_id         = var.key_vault_key_id
  #   key_vault_network_access = var.key_vault_network_access
  # }

  azure_active_directory_role_based_access_control {
    managed                = var.aad_managed            # Optional, defaults to true
    admin_group_object_ids = var.admin_group_object_ids # Required
    azure_rbac_enabled     = var.aad_rbac_enabled       # Optional, defaults to true
  }

  lifecycle {
    ignore_changes = [
      kubernetes_version,
    ]
  }
}
## variables.tf
# required variables
variable cluster_name {}
variable resource_group_name {}
variable location {}
variable log_analytics_workspace_id {}
variable admin_group_object_ids {}
variable kubernetes_version {}
variable private_dns_zone_id {}
variable umi_name {}
variable umi_rg_name {}
variable node_subnet_id {}
variable pod_subnet_id {}
variable "tags" {
  type        = map(string)
  default     = null
}

# These variables have defaults

variable service_cidr {
  default = "cidr"
}
variable dns_service_ip {
  default = "cidr"
}

## access
variable public_network_access_enabled {
  default = true
}
variable role_based_access_control_enabled {
  default = true
}
variable aad_managed {
  default = true
}
variable aad_rbac_enabled {
  default = true
}
variable identity_type {
  default = "UserAssigned"
}

# storage
variable blob_driver_enabled {
  default = true
}
variable disk_driver_enabled {
  default = true
}
variable disk_driver_version {
  default = "v1"
}
variable file_driver_enabled {
  default = true
}
variable snapshot_controller_enabled {
  default = true
}

## systempool
variable systempool_capacity_reservation_group_id {
  default = null
}
variable systempool_node_labels {
  default = null
}
variable systempool_node_taints {
  default = null
}
variable systempool_enable_auto_scaling {
  default = true
}
variable systempool_max_count {
  default = 9
}
variable systempool_min_count {
  default = 3
}
variable systempool_os_disk_type {
  default = "Ephemeral"
}
variable systempool_os_disk_size_gb {
  default = 128
}
variable systempool_only_critical_addons_enabled {
  default = true
}
variable systempool_enable_node_public_ip {
  default = false
}
variable systempool_max_pods {
  default = 110
}
variable systempool_availability_zones {
  default = ["1", "2", "3"] 
}
variable systempool_name {
  default = "systempool"
}
variable systempool_type {
  default = "VirtualMachineScaleSets"
}
variable systempool_vm_size {
  default = "Standard_D8a_v4"
}

## auto-scaler
variable balance_similar_node_groups {
  default = true
}
variable expander {
  default = "random"
}
variable max_graceful_termination_sec {
  default = 600
}
variable max_node_provisioning_time {
  default = "15m"
}
variable max_unready_nodes {
  default = 3
}
variable max_unready_percentage {
  default = 45
}
variable new_pod_scale_up_delay {
  default = "10s"
}
variable scale_down_delay_after_add {
  default = "10m"
}
variable scale_down_delay_after_delete {
  default = "10s"
}
variable scale_down_delay_after_failure {
  default = "3m"
}
variable scan_interval {
  default = "10s"
}
variable scale_down_unneeded {
  default = "10m"
}
variable scale_down_unready {
  default = "20m"
}
variable scale_down_utilization_threshold {
  default = 0.5
}
variable empty_bulk_delete_max {
  default = 10
}
variable skip_nodes_with_local_storage {
  default = false
}
variable skip_nodes_with_system_pods {
  default = false
}
variable keda_enabled {
  default= true
}

## network
variable outbound_type {
  default = "userDefinedRouting"
}
variable docker_bridge_cidr {
  default = "100.68.152.0/21"
}
variable load_balancer_sku {
  default = "standard"
}
variable network_plugin {
  default = "azure"
}
variable network_policy {
  default = "calico"
}
variable http_application_routing_enabled {
  default = false
}
variable ebpf_data_plane {
  default = null
}

## secrets
variable secret_rotation_enabled {
  default = true
}
variable secret_roation_interval {
  default = "2m"
}

## upgrades
variable automatic_channel_upgrade {
  default = "patch"
}
variable maintenance_window_enabled {
  default = true
}
variable max_surge {
  default = 1
}

variable workload_identity_enabled {
  default = true
}

## maintenance
variable maintenance_window_day {
  default = "Tuesday"
}
variable maintenance_window_hours {
  default = [1,4]
}

## other
variable sku_tier {
  default = "Standard"
}
variable azure_policy_enabled {
  default = true
}
variable private_cluster_enabled {
  default = true
}
variable local_account_disabled {
  default = true
}
variable oidc_issuer_enabled {
  default = true
}
variable disk_encryption_set_id {
  default = null
}

module aks {
  source                     = "gitlab"
  cluster_name               = var.cluster_name
  resource_group_name        = azurerm_resource_group.aks.name
  location                   = azurerm_resource_group.aks.location
  umi_name                   = "umi-${var.cluster_name}"
  umi_rg_name                = lookup(local.app_context_map[var.cluster_name], "rg_name", null)  
  log_analytics_workspace_id = data.azurerm_log_analytics_workspace.this.id    
  disk_encryption_set_id     = data.azurerm_disk_encryption_set.this.id   
  kubernetes_version         = "1.26"
  private_dns_zone_id        = data.azurerm_private_dns_zone.aks.id
  admin_group_object_ids     = var.admin_group_object_ids
  # key_vault_key_id           = data.azurerm_key_vault_key.this.id
  tags                       = local.tags
  ebpf_data_plane            = "cilium"
  network_policy             = "cilium"

  # systempool
  node_subnet_id         = data.azurerm_subnet.node.id
  pod_subnet_id          = data.azurerm_subnet.pod.id
  systempool_os_disk_type = "Managed"
  systempool_node_labels = {}
  systempool_node_taints = []
  systempool_min_count   = 3
  systempool_max_count   = 9
  systempool_vm_size     = "Standard_D8a_v4"
}

Debug Output/Panic Output

Error: expected network_profile.0.network_policy to be one of ["calico" "azure"], got cilium
│ 
│   with module.aks.azurerm_kubernetes_cluster.this,
│   on .terraform/modules/aks/main.tf line 52, in resource "azurerm_kubernetes_cluster" "this":
│   52:     network_policy     = var.network_policy     # Optional, defaults to calico

Error: updating Kubernetes Cluster (Subscription: "sub_id"
│ Resource Group Name: "rg_name"
│ Kubernetes Cluster Name: "cluster-name"): managedclusters.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="BadRequest" Message="Cilium dataplane requires network policy cilium." Target="networkProfile.networkPolicy"

Expected Behaviour

Cilium data plane should be setup

Actual Behaviour

Doco says support for cilium was added: https://github.com/hashicorp/terraform-provider-azurerm/pull/22952

However, when i set ebpf_data_plane= "cilium", it errors out saying that network_policy needs to be cilium, but it won't accept that as a value. I tried null as well with same error.

Steps to Reproduce

Build cluster without ebfp_data_plane set, cluster builds fine. Add ebpf_data_plane="cilium" and run apply again and you get the error about network_policy needing to be set to cilium. Set network_policy value and you get that it only accepts Azure or Calico.

Important Factoids

No response

References

No response

derek-andrews-work commented 1 year ago

Just adding that is is part of the api spec: https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/create-or-update?tabs=HTTP#networkpolicy

derek-andrews-work commented 1 year ago

Another note. My example i'm trying to update an existing cluster and this may or may not be supported. But even to build a new cluster, i need to pass cilium as the network-policy.

rcskosir commented 1 year ago

Thank you for taking the time to open this issue. Please subscribe to PR #23342 created by @ms-henglu for this issue.

github-actions[bot] commented 4 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.