hashicorp / terraform-provider-kubernetes

Terraform Kubernetes provider
https://www.terraform.io/docs/providers/kubernetes/
Mozilla Public License 2.0
1.59k stars 972 forks source link

Call to http://localhost/version with configured host and credentials #708

Closed dploeger closed 4 years ago

dploeger commented 4 years ago

Terraform Version

Terraform v0.12.17

Affected Resource(s)

Please list the resources as a list, for example:

Terraform Configuration Files

provider "kubernetes" {
  version                = "~> 1.10.0"
  host                   = module.azurekubernetes.host
  username               = module.azurekubernetes.username
  password               = module.azurekubernetes.password
  client_certificate     = base64decode(module.azurekubernetes.client_certificate)
  client_key             = base64decode(module.azurekubernetes.client_key)
  cluster_ca_certificate = base64decode(module.azurekubernetes.cluster_ca_certificate)
}

resource "kubernetes_persistent_volume" "factfinder-pv" {
  metadata {
    name = "ff-nfs-client"
    labels = {
      type          = "factfinder"
      sub_type      = "nfs"
      instance_type = "pv"
    }
  }
  spec {
    access_modes = ["ReadWriteMany"]
    capacity = map("storage", "${var.shared_storage_size}Gi")

    persistent_volume_source {
      nfs {
        path   = "/"
        server = var.nfs_service_ip
      }
    }
    storage_class_name = "nfs"
  }
}

Debug Output

(The debug output is huge and I just pasted a relevant section of it. If you need more, I'll create a gist)

2019/12/13 09:45:42 [DEBUG] ReferenceTransformer: "module.factfinder.kubernetes_service.factfinder-fffui-service" references: []
2019/12/13 09:45:42 [DEBUG] ReferenceTransformer: "module.loadbalancer.kubernetes_config_map.tcp-services" references: []
2019/12/13 09:45:42 [DEBUG] ReferenceTransformer: "module.factfinder.kubernetes_deployment.factfinder-sftp" references: []
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: ---[ REQUEST ]---------------------------------------
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: GET /version?timeout=32s HTTP/1.1
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Host: localhost
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: User-Agent: HashiCorp/1.0 Terraform/0.12.17
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Accept: application/json, */*
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Accept-Encoding: gzip
2019-12-13T09:45:42.985Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4:
2019-12-13T09:45:42.986Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4:
2019-12-13T09:45:42.986Z [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: -----------------------------------------------------

Expected Behavior

When running terraform in the hashicorp/terraform container, a terraform plan should run properly

Actual Behavior

The plan errors out with the following error:

Error: Get http://localhost/version?timeout=32s: dial tcp 127.0.0.1:80: connect: connection refused

  on ../modules/factfinder/factfinder-nfs-client-pv.tf line 6, in resource "kubernetes_persistent_volume" "factfinder-pv":
   6: resource "kubernetes_persistent_volume" "factfinder-pv" {

This only happens, when running terraform in the container. When ran locally, everything is fine. (Even when the local .kube directory is removed)

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform plan or terraform apply

Important Factoids

dploeger commented 4 years ago

Hm. Now I even somehow configured my local environment, that that happens. πŸ€·β€β™‚

mtekel commented 4 years ago

Happens with me as well. I have changed kubernetes secret metadata name from a string to interpolated value... which resolves into same string. The original has no issue, the interpolated connects to locahost...

resource "kubernetes_secret" "vault-gcp" {
  metadata {
    name = "${var.deployment_name}-gcp"
  }
...
}

When name is "vault-gcp", it's fine. In new branch with above code and deployment name set to "vault", hence resulting interpolation being "vault-gcp", this fails with connection to locahost.

Seems like TF/provider thinks this is some new/different instance of the resource, which somehow does not belong to the configured kub cluster, so it probably fails back to default "localhost" address.

dploeger commented 4 years ago

I have no interpolated values in metadata, but in the spec. But I have that in all kubernetes resources and only the resource mentioned above has the problem (or it is the first one it comes across and then stops, could be as well).

Quite the phenomenom really. Any core developer around? πŸ˜‰

mtekel commented 4 years ago

Tried a workaround with conditional:

name = var.legacy == "yes" ? "vault-gcp" : "${var.deployment_name}-gcp"

This way I wanted to make it have non-interpolated string directly in some cases, but this ended up with the same issue. My TF version is 0.12.18. I have kub provider configured with host and config:

provider "kubernetes" {
  host  = google_container_cluster.vault.endpoint
  token = data.google_client_config.current.access_token
  cluster_ca_certificate = base64decode(
    ....
  )
  load_config_file = false
}

Then I have tried another workaround, with defining 2 resources, one with interpolation, one with string and then controlling which resource actually gets deployed with

count =  var.legacy == "yes" ? 1 : 0

But this ended up with new resources[0] even for legacy deployment (where it is already deployed and I am trying to achieve 0 changes on TF apply).

So I would say the issue is somehow not respecting existing kubernetes provider config for new resources...

kubernetes_secret.vault-gcp[0]: Refreshing state... [id=default/vault-gcp]
...

Error: Get http://localhost/api/v1/namespaces/default/secrets/vault-gcp: dial tcp [::1]:80: connect: connection refused
dploeger commented 4 years ago

I think, it's interesting, that it even tries to call via HTTP and not HTTPS, which would be the default I think.

mtekel commented 4 years ago

So it turns out that in my case, I was also pointing to a wrong location in the bucket where it had no tfstate. As most of the resources in GCP have same ID as name, even without state, terraform was able to find and refresh my whole stack, except the kub secrets, where it was connecting to localhost, as it had no state about where the cluster was...

In EC2, that would blow up probably sooner, as resource IDs are quite different from resource names and if you lose state you have a lot of trouble finding where everything is...

dploeger commented 4 years ago

Okay, I found the problem for my case. This line here:

https://github.com/terraform-providers/terraform-provider-kubernetes/blob/45d910a26f17f7b03d684221428b86f2f02b5be2/kubernetes/resource_kubernetes_persistent_volume.go#L40

If you remove all the CustomizeDiff part, all works fine. So I guess, the correct server isn't carried through to that point. I try to dig deeper there.

dploeger commented 4 years ago

@alexsomesan @pdecat You added that line there refactoring the whole client handling. Could you think of any implications that could cause this behaviour? It seems as if the MainClientset isn't correctly configured when it reaches the CustomizeDiff function.

pdecat commented 4 years ago

~Hi @dploeger, I believe the initialization here occurs too early. The CustomizeDiff probably needs to be replaced by a CustomizeDiffFunc.~

dploeger commented 4 years ago

@pdecat You probably know how to do this. I just stumbled through the code. πŸ˜† Are you able to provide a PR for that?

dploeger commented 4 years ago

Or can you point me on how to implement that? Just replacing CustomizeDIff with CustomizeDiffFunc didn't work at least. :)

pdecat commented 4 years ago

Never mind, it won't work, CustomizeDiffFunc is the type of the CustomizeDiff field.

Let me think of something else.

alexsomesan commented 4 years ago

@dploeger Are you building the AKS resources from module.azurekubernetes in the same apply run as the kubernetes_persistent_volume ?

dploeger commented 4 years ago

Yes, I am. And that all worked until 12-9. I can’t really grasp what has changed then, because we didn’t update or change anything there.

pdecat commented 4 years ago

@dploeger Are you building the AKS resources from module.azurekubernetes in the same apply run as the kubernetes_persistent_volume ?

Good point, that's the most frequent issue when localhost is involved. The configuration is not available at the time the kubernetes provider is initialized. The point about removing CustomizeDiff fixing the issue made me think of something else, but it turns out the kubernetes client is only initialized once by the provider.

alexsomesan commented 4 years ago

Further question: is this happening when running TF in a Pod on the cluster?

dploeger commented 4 years ago

Ummmm... I haven't tried that. Is that important? I'd have to set that up. I just tried locally. It also happens outside the container now.

jakexks commented 4 years ago

I'm experiencing this with a module that nests other modules, sometimes the child modules lose provider configuration and the terraform config becomes un-applyable, but also un-destroyable!

The parent creates a DigitalOcean Kubernetes cluster inside a module, then uses the output of the module to get a data source which configures the provider e.g.

module "e2etest_k8s" {
  source = "./infrastructure/kubernetes/do"
  providers = {
    digitalocean = digitalocean.e2etest
  }
}

data "digitalocean_kubernetes_cluster" "e2etest" {
  provider = digitalocean.e2etest
  name     = module.e2etest_k8s.cluster_name
}

provider "kubernetes" {
  alias            = "e2etest"
  load_config_file = false
  host             = data.digitalocean_kubernetes_cluster.e2etest.endpoint
  token            = data.digitalocean_kubernetes_cluster.e2etest.kube_config[0].token
  cluster_ca_certificate = base64decode(
    data.digitalocean_kubernetes_cluster.e2etest.kube_config[0].cluster_ca_certificate
  )
}

// This also contains submodules
module "<rest of infra>" {
  source = "./<folders>"
  providers = {
    kubernetes = kubernetes.e2etest
  }
}

This provider is then used for a bunch of modules (which also contain modules) that then exhibit the localhost behavior (sometimes, but it seems deterministic between runs).

nothingofuse commented 4 years ago

any updates on this? Im trying to upgrade from 7.0.1 to 8.2.0 of the EKS terraform module (https://github.com/terraform-aws-modules/terraform-aws-eks) -- I'm able to get through the initial import of the aws-auth configmap by using a local kubeconfig the first time (overriding load_config_file to true for the import), but subsequent plans always fail with a call to localhost.

my provider config looks like

provider "kubernetes" {
  load_config_file       = var.load_config 
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  version                = "1.10.0" # Stable version??
}

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}
Error: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp [::1]:80: connect: connection refused
module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: 2020/02/11 10:16:22 [INFO] Checking config map aws-auth
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: 2020/02/11 10:16:22 [DEBUG] Kubernetes API Request Details:
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: ---[ REQUEST ]---------------------------------------
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: GET /api/v1/namespaces/kube-system/configmaps/aws-auth HTTP/1.1
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Host: localhost
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: User-Agent: HashiCorp/1.0 Terraform/0.12.20
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Accept: application/json, */*
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: Accept-Encoding: gzip
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4:
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4:
2020-02-11T10:16:22.087-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: -----------------------------------------------------
2020-02-11T10:16:22.089-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.10.0_x4: 2020/02/11 10:16:22 [DEBUG] Received error: &url.Error{Op:"Get", URL:"http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth", Err:(*net.OpError)(0xc000976050)}
2020/02/11 10:16:22 [ERROR] module.eks: eval: *terraform.EvalRefresh, err: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp [::1]:80: connect: connection refused
2020/02/11 10:16:22 [ERROR] module.eks: eval: *terraform.EvalSequence, err: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp [::1]:80: connect: connection refused

I'm happy to provide further information/logs/tests to get this issue resolved ASAP. I have tried provider versions 1.8.1, 1.9.0, 1.10.0 and 1.11.0 (1.11.0 gives me a different error corresponding to issue 759). I'm using terraform 0.12.20

hazcod commented 4 years ago

Having the same issue where I use the scaleway kapsule provider kubeconfig output as input for my kubernetes terraform provider. Using local kubeconfig does not resolve the issue during terraform plan. https://github.com/ironPeakServices/infrastructure/runs/435886375?check_suite_focus=true

brpaz commented 4 years ago

I have exact problem of @jakexks and @hazcod. Everything was working when I had everything in the root module but when I split to a separate module, it starts giving errors saying "rror: invalid configuration: no configuration has been provided" as well as trying to connect using localhost.

hazcod commented 4 years ago

@brpaz : so it works if you run it from the root module? Might be an overall terraform issue, since I had the issue that some terraform variables were not being set for submodules, making me have to set it in the root module too e.g.: https://github.com/ironPeakServices/infrastructure/blob/master/versions.tf#L20

brpaz commented 4 years ago

@hazcod yes, I had all my Terraform resources into main.tf in the root module. Everything was working. Because the configs were growing I created a module and split my main.tf into several files inside the module. After that change and run terraform apply, it started giving these errors.

But then I tried a fresh install (clean state and a new cluster provision from scratch and it worked. I think somehow a conflict between what was persisted in the state file and the new terraform declarations, resulted in terraform to pick a wrong config?

hazcod commented 4 years ago

This might be related to https://github.com/hashicorp/terraform/issues/24131?notification_referrer_id=MDE4Ok5vdGlmaWNhdGlvblRocmVhZDcxODEyOTY3MDo1MjIyNTEy#issuecomment-587144096

hazcod commented 4 years ago

After reaching out to terraform core, above issue seems to indicate that it's a kubernetes provider issue where it's not handling the unknown variables well.

hazcod commented 4 years ago

I have drilled this down to the following: if a kubernetes provider is receiving unknown values (because of a dependency), it should go through with the plan because it would normally be fulfilled in the apply phase. I think that's a better approach than just erroring out now.

hazcod commented 4 years ago

This is really frustrated, if my scaleway provider cluster is removed, I have to take following manual steps:

hazcod commented 4 years ago

I circumvented this with:

provider "kubernetes" {
    # fixed to 1.10.0 because of https://github.com/terraform-providers/terraform-provider-kubernetes/issues/759
    version = "1.10.0" 
    # set the variable in the root module or else we have a dependency issue
    token = module.scaleway.token
}
davidschrooten commented 4 years ago

Have this problem as well, running terraform in a container on gcp cloud build triggers. Since last month it is trying to connect to localhost and ignores the host set in the provider config.

hazcod commented 4 years ago

@davidq2q Have you tried with v1.11.1?

davidschrooten commented 4 years ago

@davidq2q Have you tried with v1.11.1?

Forgot to reply but after setting the version to 1.10.0 yesterday everything seems to work; all builds are green now.

hazcod commented 4 years ago

Yes but the latest would supposedly fix that.

hazcod commented 4 years ago

1.11.1 does not fix the issue for me: https://github.com/ironPeakServices/infrastructure/runs/489796449

brpaz commented 4 years ago

I circumvented this with:

provider "kubernetes" {
    # fixed to 1.10.0 because of https://github.com/terraform-providers/terraform-provider-kubernetes/issues/759
    version = "1.10.0" 
    # set the variable in the root module or else we have a dependency issue
    token = module.scaleway.token
}

1.11.1 Also didnt fix the issue for me. @hazcod What do you mean by "set the variable in the root module"?

I have this in my providers.tf on root module:

provider "kubernetes" {
  load_config_file = false
  host             = module.k8s_cluster.cluster_host
  token            = module.k8s_cluster.cluster_token
  cluster_ca_certificate = base64decode(
    module.k8s_cluster.cluster_ca_certificate
  )
}

I am thinking of starting evaluating pulumi as an alternative do terraform if I continue to have issues like this,

pdecat commented 4 years ago

To those still facing this issue with version 1.11.1 of the kubernetes provider, could you please share the output of the terraform providers command?

I came across a similar issue and it was caused by a sub-module redefining a provider without load_config_file = false.

liangyungong commented 4 years ago
.
β”œβ”€β”€ provider.aws ~> 2.44.0
β”œβ”€β”€ provider.kubernetes ~> 1.10
β”œβ”€β”€ provider.terraform
└── module.cluster
    β”œβ”€β”€ provider.aws (inherited)
    β”œβ”€β”€ module.alb_ingress_controller_iam_policy
    β”‚Β Β  └── provider.aws (inherited)
    β”œβ”€β”€ module.eks
    β”‚Β Β  β”œβ”€β”€ provider.aws >= 2.38.0
    β”‚Β Β  β”œβ”€β”€ provider.kubernetes >= 1.6.2
    β”‚Β Β  β”œβ”€β”€ provider.local >= 1.2
    β”‚Β Β  β”œβ”€β”€ provider.null >= 2.1
    β”‚Β Β  β”œβ”€β”€ provider.random >= 2.1
    β”‚Β Β  β”œβ”€β”€ provider.template >= 2.1
    β”‚Β Β  └── module.node_groups
    β”‚Β Β      β”œβ”€β”€ provider.aws (inherited)
    β”‚Β Β      └── provider.random
    β”œβ”€β”€ module.external_dns_iam_policy
    β”‚Β Β  └── provider.aws (inherited)
    β”œβ”€β”€ module.k8s_config
    β”‚Β Β  β”œβ”€β”€ provider.aws
    β”‚Β Β  β”œβ”€β”€ provider.helm
    β”‚Β Β  β”œβ”€β”€ provider.kubernetes (inherited)
    β”‚Β Β  └── module.metrics_server
    β”‚Β Β      └── provider.kubernetes (inherited)
    └── module.model_bucket
        └── provider.aws (inherited)
pdecat commented 4 years ago

@liangyungong are both your providers declared in the root module and eks sub-module defining load_config_file = false ?

liangyungong commented 4 years ago

yes indeed.

rg 'provider.*kubernetes' -w ../../ --hidden --no-ignore --glob='*.tf' -A 5 | grep load_config_file
../../application/prd-0/environment.tf-  load_config_file       = false
../../application/prd-1/environment.tf-  load_config_file       = false
../../application/stg-0/environment.tf-  load_config_file       = false
../../application/prd-0/.terraform/modules/cluster.k8s_config.metrics_server/azure/stacks/aks_cluster/providers.tf-  load_config_file = false
../../application/prd-1/.terraform/modules/cluster.cluster_autoscaler/azure/stacks/aks_cluster/providers.tf-  load_config_file = false
../../application/prd-1/.terraform/modules/cluster.k8s_config.metrics_server/azure/stacks/aks_cluster/providers.tf-  load_config_file = false
pdecat commented 4 years ago

I'm confused, your terraform providers output has eks and this grep output has aks.

hazcod commented 4 years ago

@pdecat : In my case, I encounter this issue when I fire off our kubernetes provider as a dependency on scaleway provider with a fresh cluster. During plan, the variables from the scaleway provider will be empty (since there is no cluster yet), so kubernetes will dial to the default values. More specifically in my case the kubernetes provider variables are populated with the exported kubeconfig of scaleway provider.

pdecat commented 4 years ago

@hazcod I've looked into your case, but did not find any explanation yet. Maybe the issue is in the provider's configuration recorded in the existing state.

FWIW, this works in a single apply pass from scratch with v1.11.1 and GKE (I do not have a test scaleway account):

main.tf:

provider "google" {
  version = "3.12.0"
  region  = "us-west1"
  # Other provider settings provided via ENV variables
}

module gke {
  source = "./gke"
}

data "google_client_config" "default" {
}

provider "kubernetes" {
  version = "1.11.1"

  load_config_file       = false
  host                   = module.gke.endpoint
  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(module.gke.cluster_ca_certificate)
}

module kubernetes {
  source = "./kubernetes"
}

output "cluster_name" {
  value = module.gke.cluster_name
}

output "location" {
  value = module.gke.location
}

output "endpoint" {
  value = module.gke.endpoint
}

gke/main.tf:

data "google_compute_zones" "available" {
}

resource "google_container_cluster" "primary" {
  name               = "terraform-example-cluster"
  location           = data.google_compute_zones.available.names[0]
  initial_node_count = 1

  min_master_version = "1.15.9-gke.22"
  node_version       = "1.15.9-gke.22"

  master_auth {
    username = ""
    password = ""
  }
}

output "cluster_name" {
  value = google_container_cluster.primary.name
}

output "location" {
  value = google_container_cluster.primary.location
}

output "endpoint" {
  value = google_container_cluster.primary.endpoint
}

output "cluster_ca_certificate" {
  value = google_container_cluster.primary.master_auth[0].cluster_ca_certificate
}

kubernetes/main.tf:

resource "kubernetes_namespace" "example" {
  metadata {
    name = "terraform-example-namespace"
  }
}

Init:

# rm -rf .terraform/
# terraform init
Initializing modules...
- gke in gke
- kubernetes in kubernetes

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "google" (hashicorp/google) 3.12.0...
- Downloading plugin for provider "kubernetes" (hashicorp/kubernetes) 1.11.1...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
# terraform providers
.
β”œβ”€β”€ provider.google 3.12.0
β”œβ”€β”€ provider.kubernetes 1.11.1
β”œβ”€β”€ module.gke
β”‚Β Β  └── provider.google (inherited)
└── module.kubernetes
    └── provider.kubernetes (inherited)

Apply:

# terraform apply -auto-approve
module.gke.data.google_compute_zones.available: Refreshing state...
data.google_client_config.default: Refreshing state...
module.gke.google_container_cluster.primary: Creating...
module.gke.google_container_cluster.primary: Still creating... [10s elapsed]
module.gke.google_container_cluster.primary: Still creating... [20s elapsed]
module.gke.google_container_cluster.primary: Still creating... [30s elapsed]
module.gke.google_container_cluster.primary: Still creating... [40s elapsed]
module.gke.google_container_cluster.primary: Still creating... [50s elapsed]
module.gke.google_container_cluster.primary: Still creating... [1m0s elapsed]
module.gke.google_container_cluster.primary: Still creating... [1m10s elapsed]
module.gke.google_container_cluster.primary: Still creating... [1m20s elapsed]
module.gke.google_container_cluster.primary: Still creating... [1m30s elapsed]
module.gke.google_container_cluster.primary: Still creating... [1m40s elapsed]
module.gke.google_container_cluster.primary: Still creating... [1m50s elapsed]
module.gke.google_container_cluster.primary: Still creating... [2m0s elapsed]
module.gke.google_container_cluster.primary: Still creating... [2m10s elapsed]
module.gke.google_container_cluster.primary: Still creating... [2m20s elapsed]
module.gke.google_container_cluster.primary: Still creating... [2m30s elapsed]
module.gke.google_container_cluster.primary: Creation complete after 2m33s [id=projects/myproject/locations/us-west1-a/clusters/terraform-example-cluster]
module.kubernetes.kubernetes_namespace.example: Creating...
module.kubernetes.kubernetes_namespace.example: Creation complete after 1s [id=terraform-example-namespace]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

cluster_name = terraform-example-cluster
endpoint = 35.197.114.71
location = us-west1-a
# kubectl --context gke_myproject_us-west1-a_terraform-example-cluster get ns terraform-example-namespace
NAME                          STATUS   AGE
terraform-example-namespace   Active   3m9s

Destroy:

# terraform destroy -auto-approve
data.google_client_config.default: Refreshing state...
module.gke.data.google_compute_zones.available: Refreshing state...
module.gke.google_container_cluster.primary: Refreshing state... [id=projects/myproject/locations/us-west1-a/clusters/terraform-example-cluster]
module.kubernetes.kubernetes_namespace.example: Refreshing state... [id=terraform-example-namespace]
module.kubernetes.kubernetes_namespace.example: Destroying... [id=terraform-example-namespace]
module.kubernetes.kubernetes_namespace.example: Still destroying... [id=terraform-example-namespace, 10s elapsed]
module.kubernetes.kubernetes_namespace.example: Destruction complete after 15s
module.gke.google_container_cluster.primary: Destroying... [id=projects/myproject/locations/us-west1-a/clusters/terraform-example-cluster]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 10s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 20s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 30s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 40s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 50s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 1m0s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 1m10s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 1m20s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 1m30s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 1m40s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 1m50s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 2m0s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 2m10s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 2m20s elapsed]
module.gke.google_container_cluster.primary: Still destroying... [id=projects/myproject/locations/...1-a/clusters/terraform-example-cluster, 2m30s elapsed]
module.gke.google_container_cluster.primary: Destruction complete after 2m37s

Destroy complete! Resources: 2 destroyed.
liangyungong commented 4 years ago

I'm confused, your terraform providers output has eks and this grep output has aks.

they're irrelevant files, just how the modules are organised in git repo. :)

pdecat commented 4 years ago

@liangyungong I still do not get how you can have AWS resources in the terraform providers output and Azure resource in the grep output. They do not correspond to each other.

Your terraform providers explicitly states that there's a kubernetes provider initialized in the AWS eks module that is not inherited from the root module:

.
β”œβ”€β”€ provider.aws ~> 2.44.0
β”œβ”€β”€ provider.kubernetes ~> 1.10
β”œβ”€β”€ provider.terraform
└── module.cluster
    β”œβ”€β”€ provider.aws (inherited)
    β”œβ”€β”€ module.alb_ingress_controller_iam_policy
    β”‚   └── provider.aws (inherited)
    β”œβ”€β”€ module.eks
    β”‚   β”œβ”€β”€ provider.aws >= 2.38.0
    β”‚   β”œβ”€β”€ provider.kubernetes >= 1.6.2 # <-- HERE
[...]

That means there a provider kubernetes block in there.

Can you check the content of that module?

liangyungong commented 4 years ago

@liangyungong I still do not get how you can have AWS resources in the terraform providers output and Azure resource in the grep output. They do not correspond to each other.

Your terraform providers explicitly states that there's a kubernetes provider initialized in the AWS eks module that is not inherited from the root module:

.
β”œβ”€β”€ provider.aws ~> 2.44.0
β”œβ”€β”€ provider.kubernetes ~> 1.10
β”œβ”€β”€ provider.terraform
└── module.cluster
    β”œβ”€β”€ provider.aws (inherited)
    β”œβ”€β”€ module.alb_ingress_controller_iam_policy
    β”‚   └── provider.aws (inherited)
    β”œβ”€β”€ module.eks
    β”‚   β”œβ”€β”€ provider.aws >= 2.38.0
    β”‚   β”œβ”€β”€ provider.kubernetes >= 1.6.2 # <-- HERE
[...]

That means there a provider kubernetes block in there.

Can you check the content of that module?

There're many other modules in the same git repo, and they are irrelevant to the module that I use. Whenever I do terraform init, it clones the whole git repo.

pdecat commented 4 years ago

There're many other modules in the same git repo, and they are irrelevant to the module that I use. Whenever I do terraform init, it clones the whole git repo.

So the module.eks provider block does not have load_config_file = false.

dsymonds commented 4 years ago

I'm hitting this problem, but not with any modules.

$ terraform providers
.
β”œβ”€β”€ provider.google ~> 3.13
β”œβ”€β”€ provider.google-beta ~> 3.13
β”œβ”€β”€ provider.kubernetes.xxx ~> 1.11.1
└── provider.kubernetes.yyy ~> 1.11.1

(two separate kubernetes providers with aliases)

Is there a known workaround that doesn't involve winding back the kubernetes provider to 1.10? I need to be using 1.11 for other reasons.

dsymonds commented 4 years ago

Actually my setup has started working again after forcibly re-fetching credentials, though it was very confusing why it was trying to contact localhost when the creds were bad.

plwhite commented 4 years ago

Not sure if this is the same problem, but just in case, I hit the following.

I had a kubernetes provider blob looking a bit like this.

provider "kubernetes" {
  version                = "1.11"
  host                   = var.credentials.host
  username               = var.credentials.username
  password               = var.credentials.password
  client_certificate     = var.credentials.client_certificate
  client_key             = var.credentials.client_key
  cluster_ca_certificate = var.credentials.cluster_ca_certificate
}

This failed in both 1.10 and 1.11. With 1.10, I got an error report explaining that I must set username and password or bearer token not both (fair enough). With 1.11, no error and it ignored host, contacting localhost.

If I removed username and password from then it all worked (in both versions). That makes me think that a failure in validation in 1.11 might lead to it dropping through with the host still set to localhost.

alexsomesan commented 4 years ago

@plwhite The error you got in 1.10 was not right, but not exhaustive since client certificates are also an equivalent form of authentication. Better validation was introduced in 1.11 that why you are not seeing that error anymore. The rule is to have one of either: token, user/pass OR client certificates. Having two of these like in you example is not deterministic (which one should be used to authenticate you?) and it looks like that's not being validated - we'll work on fixing that.

However, the reason you're seeing the connection to localhost is likely because Terraform is unable to resolve the value for var.credentials.host at the right time. How is var.credentials being populated in your case?

plwhite commented 4 years ago

@alexsomesan I was populating var.credentials through variables set up by the azurerm provider creating an AKS cluster, which from memory did have host configured. I'm moderately sure that was set consistently but it's possible there was a transient error where it failed at about the same time as I hit this. Since moving to the more recent kubernetes provider I've seen no further issues, so quite happy to consider this fixed.