hashicorp / terraform-provider-kubernetes

Terraform Kubernetes provider
https://www.terraform.io/docs/providers/kubernetes/
Mozilla Public License 2.0
1.6k stars 979 forks source link

Refresh of resources inside AKS cluster stops after initial deployment - Connection against localhost? #2127

Closed mkambeck closed 4 months ago

mkambeck commented 1 year ago

Hi all! I've searched for hours and hours and tried a lot of stuff but unfortunately without success. Maybe someone here has the same problem and can help me :)

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 1.4.6
Kubernetes provider version: 2.20.0
Kubernetes version: 1.25.5

Affected Resource(s)

Terraform Configuration Files

I've deployed an Azure Kubernetes Service Cluster via Terraform and initialized the Kubernetes Provider on the same level. With the help of the initialized Kubernetes Provider I'm creating multiple resources inside the cluster itself. Additionally, I'm passing the provider down to a module that contains a FluxCD deployment. The initialization of the Provider only happens on the same level where the Azure Kubernetes Service Cluster is created. The Azure Kubernetes Service Cluster itself resides in a module (and is called three times), but above this, no Kubernetes Provider or anything like it is created/initialized.

The Terraform deployment runs inside a Gitlab Terraform Container Image with the help of the gitlab-terraform wrapper script.

The initial deployment works without problems, but after this, I'm not able to establish a connection to one of the clusters and I'm getting multiple error messages when refreshing the resources. The resources in the other two clusters can be refreshed without errors or warnings.

Below you can see how I intialize the provider, create a resource and how I'm passing the provider down to the module. I've already tried the -parallelism=1 and -refresh=false parameters, but I definitely need the refresh and this isn't a solution.

provider "kubernetes" {
  alias                  = "admin"
  host                   = azurerm_kubernetes_cluster.aks.kube_admin_config.0.host
  username               = azurerm_kubernetes_cluster.aks.kube_admin_config.0.username
  password               = azurerm_kubernetes_cluster.aks.kube_admin_config.0.password
  client_certificate     = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_certificate)
  client_key             = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_key)
  cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.cluster_ca_certificate)
}

resource "kubernetes_storage_class_v1" "test" {
  provider = kubernetes.admin

  metadata {
    name = "test"
  }
  storage_provisioner    = "disk.csi.azure.com"
  reclaim_policy         = "Retain"
  volume_binding_mode    = "WaitForFirstConsumer"
  allow_volume_expansion = true
  parameters = {
    location    = "westeurope",
    skuName     = "StandardSSD_LRS"
    cachingMode = "ReadOnly"
  }
}

module "bootstrap" {
  source = "../flux"
  providers = {
    kubernetes = kubernetes.admin
  }

  aks_name               = azurerm_kubernetes_cluster.aks.name
  host                   = azurerm_kubernetes_cluster.aks.kube_admin_config.0.host
  cluster_ca_certificate = azurerm_kubernetes_cluster.aks.kube_admin_config.0.cluster_ca_certificate
  token                  = azurerm_kubernetes_cluster.aks.kube_admin_config.0.password
  gitlab_token_flux      = var.gitlab_token_flux
  gitlab_project_id_flux = var.gitlab_project_id_flux
  gitlab_repo_url_flux   = var.gitlab_repo_url_flux
  ingress_public_ip      = var.ingress_public_ip
}

Debug Output

N/A

Panic Output

N/A

Steps to Reproduce

  1. terraform plan --> Plan is created
  2. terraform apply--> Resources are created
  3. Change something inside the Terraform definition
  4. terraform plan --> Plan fails with errors

Expected Behavior

Terraform should refresh cluster resources and update them, if needed.

Actual Behavior

Terraform plan/apply fails because it tries to (probably) authenticates against localhost. When I'm initalizing the state locally, it tries to connect against localhost.

│ Error: storageclasses.storage.k8s.io "test" is forbidden: User "system:serviceaccount:runner:default" cannot get resource "storageclasses" in API group "storage.k8s.io" at the cluster scope
│ 
│   with module.aks_webshop.kubernetes_storage_class_v1.elastic,
│   on aks/main.tf line 375, in resource "kubernetes_storage_class_v1" "elastic":
│  375: resource "kubernetes_storage_class_v1" "elastic" {

Important Factoids

Azure Kubernetes Service

References

Don't know for sure if it is related, but the behavior is almost the same:

github-actions[bot] commented 5 months ago

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!