Cycle during apply on GKE cluster recreation with kubernetes provider

pselden commented 4 years ago

I cannot reproduce this yet on a smaller example... but maybe someone else out there can shed some light on what's happening.

I am encountering a terraform error on apply, but not plan. The change recreates our GKE cluster, as well as destroys some kubernetes objects that were created in that GKE object.

Terraform Version

v0.12.24

Terraform Configuration Files

All the code is inside of large modules so I don't know what would be useful to put here...

provider "kubernetes" {
  host                   = module.terraform-gcp-gke.gke_host
  token                  = module.terraform-gcp-gke.gke_token
  cluster_ca_certificate = module.terraform-gcp-gke.gke_ca_certificate
  load_config_file       = false
}

module "terraform-gcp-gke" {
  source  = "app.terraform.io/openx/gke/module"

  cluster_name               = "kubeflow"
  instance_number            = var.instance_number
  min_master_version         = var.min_master_version
}

These were removed:

locals {
  kf_sa_name             = lookup(kubernetes_service_account.kubeflow-installer.metadata[0], "name")
}

resource "kubernetes_service_account" "kubeflow-installer" {
  # ... snip

  subject {
    kind      = "ServiceAccount"
    name      = local.kf_sa_name

  # ... snip
}

resource "kubernetes_cluster_role_binding" "kubeflow-installer" {
  # snip
}

resource "kubernetes_job" "kubeflow-installer" {
  # ... snip
      spec {
        automount_service_account_token = true
        service_account_name            = local.kf_sa_name
 # ... snip
}

Expected Behavior

The cluster should be able to be recreated without a cycle error.

Actual Behavior

There is a cycle error:

 module.terraform-gcp-gke.local.gke_host_endpoint, 
  module.terraform-gcp-gke.output.gke_host, 
  module.terraform-gcp-gke.module.bootstrapper.kubernetes_job.main (destroy),
  kubernetes_service_account.kubeflow-installer (destroy),
  kubernetes_cluster_role_binding.kubeflow-installer (destroy),
  kubernetes_job.kubeflow-installer (destroy),
  module.terraform-gcp-gke.google_container_cluster.gke (destroy),
  module.terraform-gcp-gke.google_container_cluster.gke,
  module.terraform-gcp-gke.local.gke_cert,
  module.terraform-gcp-gke.output.gke_ca_certificate,
  provider.kubernetes

Additional Context

References

Possibly related: #24510

danieldreier commented 4 years ago

@pselden I'd really like to help, but without more context it's really hard to dig into this. Until you're able to provide a reproduction case, I think that posting in the Terraform section of the HashiCorp community forum is likely to get you better results.

There may well be a bug in here that we can help with, but I have to be able to reproduce it in order to work on it. I'm going to close this for now, but if you can put together a reproduction case I'd be happy to open this back up.

pselden commented 4 years ago

Thanks for the link to the forum. I suspect it has something to do with us using the output from the google_container_cluster resource as inputs to the kubernetes provider -- so when it has to destroy and recreate the cluster it would have to be re-initialized for it to work correctly. Is that something that is supported?

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform