hashicorp / terraform-provider-kubernetes

Terraform Kubernetes provider
https://www.terraform.io/docs/providers/kubernetes/
Mozilla Public License 2.0
1.6k stars 974 forks source link

Terraform uses default kubectl context group configurations rather what is defined in context #176

Closed johnroach closed 4 years ago

johnroach commented 6 years ago

Hi there,

Came across an interesting problem. It looks like Terraform by default tries to use the context configurations within the system for kubernetes context authentication. So although you define a context and your kubectl defaults to something else kube related data lookups will fail for a terraform plan. Is there a way to tell Terraform to not use kubectl context in system? Or is this a feature request?

Terraform Version

Terraform v0.11.7

Affected Resource(s)

Terraform Configuration Files

provider "kubernetes" "project_kube_dga_k8_context" {
  alias                  = "project_kube_dga_k8_context"
  config_context_cluster = "gke_env-kube-east1-b_prod-gke-dga"
}
resource "google_service_account_key" "env_datastore_user_key" {
  service_account_id = "${google_service_account.env_datastore_datastore_user.name}"
}
resource "kubernetes_secret" "kubernetes_dga_datastore_credentials" {
  provider = "kubernetes.project_kube_dga_k8_context"
  metadata {
    name = "env-datastore-user-key"
  }
  data {
    credentials.json = "${base64decode(google_service_account_key.env_datastore_user_key.private_key)}"
  }
}

Debug Output

* kubernetes_secret.kubernetes_dga_datastore_credentials: kubernetes_secret.kubernetes_dga_datastore_credentials: secrets "env-datastore-user-key" is forbidden: User "system:anonymous" cannot get secrets in the namespace "default": Unknown user "system:anonymous"

Panic Output

N/A

Expected Behavior

gke_env-kube-east1-b_prod-gke-dga cluster context should have been used.

Actual Behavior

Default kubectl cluster context gets used

Workaround

Simply change the kubectl installed on the system to use the same context

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. Set a different cluster context manually for kubectl than what is used by your kubernetes provider config.
  2. terraform apply

Important Factoids

References

N/A

pdecat commented 6 years ago

Same issue here.

I believe the provider should error instead of falling back on system's default configuration if any of the explicitly set configuration parameters is incorrect.

Provider configuration:

provider "kubernetes" {
  version = "1.3.0"

  alias = "cluster_regional_1"

  # Load configuration from the user's kubeconfig file as basic authentication and client certificates are no longer activated
  load_config_file = true

  config_path            = "./myproject-preprod-europe-west1-gke1.kubeconfig"
  config_context         = "gke_myproject-preprod_europe-west1_myproject-preprod-europe-west1-gke1"
  config_context_cluster = "gke_myproject-preprod_europe-west1_myproject-preprod-europe-west1-gke1"

  cluster_ca_certificate = "${base64decode(data.google_container_cluster.cluster1.master_auth.0.cluster_ca_certificate)}"

  host = "https//35.190.XXX.XXX"
}

Excerpt from plan with missing kubernetes configuration file:

2018-10-30T21:09:17.406+0100 [DEBUG] plugin.terraform-provider-kubernetes_v1.3.0_x4: 2018/10/30 21:09:17 [DEBUG] Using custom current context: "gke_myproject-preprod_europe-west1_objenious-preprod-europe-west1-gke1"
2018-10-30T21:09:17.406+0100 [DEBUG] plugin.terraform-provider-kubernetes_v1.3.0_x4: 2018/10/30 21:09:17 [DEBUG] Using overidden context: api.Context{LocationOfOrigin:"", Cluster:$gke_myproject-preprod_europe-west1_myproject-preprod-europe-west1-gke1", AuthInfo:"", Namespace:"", Extensions:map[string]runtime.Object(nil)}
2018-10-30T21:09:17.406+0100 [DEBUG] plugin.terraform-provider-kubernetes_v1.3.0_x4: 2018/10/30 21:09:17 [INFO] Unable to load config file as it doesn't exist at "./myproject-preprod-europe-west1-gke1.kubeconfig"
2018/10/30 21:09:17 [DEBUG] Resource state not found for "module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[2]": module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[2]
2018/10/30 21:09:17 [DEBUG] Resource state not found for "module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[0]": module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[0]
2018/10/30 21:09:17 [DEBUG] Resource state not found for "module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[1]": module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[1]
2018/10/30 21:09:17 [DEBUG] Removing "module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[2]", filtered by targeting.
2018/10/30 21:09:17 [DEBUG] Removing "module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[1]", filtered by targeting.
2018/10/30 21:09:17 [DEBUG] ReferenceTransformer: "module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[0]" references: []

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[0]
      id:                          <computed>
      metadata.#:                  "1"
      metadata.0.generation:       <computed>
      metadata.0.labels.%:         "1"
      metadata.0.labels.name:      "default"
      metadata.0.name:             "default"
      metadata.0.resource_version: <computed>
      metadata.0.self_link:        <computed>
      metadata.0.uid:              <computed>

Plan: 1 to add, 0 to change, 0 to destroy.

Excerpt from apply with missing kubernetes configuration file:

2018-10-30T21:01:28.000+0100 [DEBUG] plugin.terraform-provider-kubernetes_v1.3.0_x4: 2018/10/30 21:01:28 [DEBUG] Using custom current context: "gke_myproject-preprod_europe-west1_myproject-preprod-europe-west1-gke1"
2018-10-30T21:01:28.000+0100 [DEBUG] plugin.terraform-provider-kubernetes_v1.3.0_x4: 2018/10/30 21:01:28 [DEBUG] Using overidden context: api.Context{LocationOfOrigin:"", Cluster:"gke_myproject-preprod_europe-west1_myproject-preprod-europe-west1-gke1", AuthInfo:"", Namespace:"", Extensions:map[string]runtime.Object(nil)}
2018-10-30T21:01:28.000+0100 [DEBUG] plugin.terraform-provider-kubernetes_v1.3.0_x4: 2018/10/30 21:01:28 [INFO] Unable to load config file as it doesn't exist at "./myproject-preprod-europe-west1-gke1.kubeconfig"
module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[0]: Creating...
  metadata.#:                  "" => "1"
  metadata.0.generation:       "" => "<computed>"
  metadata.0.labels.%:         "" => "1"
  metadata.0.labels.name:      "" => "default"
  metadata.0.name:             "" => "default"
  metadata.0.resource_version: "" => "<computed>"
  metadata.0.self_link:        "" => "<computed>"
  metadata.0.uid:              "" => "<computed>"
2018-10-30T21:01:28.003+0100 [DEBUG] plugin.terraform-provider-kubernetes_v1.3.0_x4: 2018/10/30 21:01:28 [INFO] Creating new namespace: v1.Namespace{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"default", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"name":"default"}, Annotations:map[string]string{}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1.NamespaceSpec{Finalizers:[]v1.FinalizerName(nil)}, Status:v1.NamespaceStatus{Phase:""}}
2018/10/30 21:01:28 [ERROR] root.cluster_regional_1.namespaces_system: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* kubernetes_namespace.namespaces.0: namespaces is forbidden: User "system:anonymous" cannot create namespaces at the cluster scope: Unknown user "system:anonymous"
2018/10/30 21:01:28 [ERROR] root.cluster_regional_1.namespaces_system: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* kubernetes_namespace.namespaces.0: namespaces is forbidden: User "system:anonymous" cannot create namespaces at the cluster scope: Unknown user "system:anonymous"

2018/10/30 21:01:30 [DEBUG] plugin: waiting for all plugin processes to complete...
Error: Error applying plan:
2018-10-30T21:01:30.652+0100 [DEBUG] plugin.terraform-provider-kubernetes_v1.3.0_x4: 2018/10/30 21:01:30 [ERR] plugin: plugin server: accept unix /tmp/plugin686931577: use of closed network connection

1 error(s) occurred:

* module.cluster_regional_1.module.namespaces_system.kubernetes_namespace.namespaces[0]: 1 error(s) occurred:

* kubernetes_namespace.namespaces.0: namespaces is forbidden: User "system:anonymous" cannot create namespaces at the cluster scope: Unknown user "system:anonymous"

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Note that it only fails because I explicitely set master IP address in host field. Otherwise, it would have created the resource in the system's default configured cluster. I voluntarilly set the most fields to catch this issue.

Edit: posted wrong logs extracts, fixed.

mattfysh commented 5 years ago

I'm not sure if I'm having a similar problem here, trying to use the JSON key file supported by terraform-provider-google plugin

data "external" "google_auth" {
  program = ["cat", "${path.module}/google/auth.json"]
}

data "http" "client_cert" {
  url = "${data.external.google_auth.result["client_x509_cert_url"]}"
}

data "external" "client_cert" {
  program = ["echo", "${data.http.client_cert.body}"]
}

locals {
  client_key = "${data.external.google_auth.result["private_key"]}"
  private_key_id = "${data.external.google_auth.result["private_key_id"]}"
  client_certificate = "${data.external.client_cert.result["${local.private_key_id}"]}"
}

provider "kubernetes" {
  host = "https://${google_container_cluster.primary.endpoint}"
  client_certificate = "${local.client_certificate}"
  client_key = "${local.client_key}"
  cluster_ca_certificate = "${base64decode("${google_container_cluster.primary.master_auth.0.cluster_ca_certificate}")}"
}

I'm new to terraform and can't figure out how to get it to log out the variable values (to make sure I'm passing the right config into kubernetes provider)

I'd prefer to use the static authentication method to remove the dependency on kubectl and its config file (in hopes of making the terraform config more portable)... can anyone help out with the above code?

mattfysh commented 5 years ago

decided on going with using the generated client certificate from the cluster masterauth (CN=client). Had to bind this user to the cluster-admin role though, more info: https://github.com/kubernetes/kubernetes/issues/65400

aeneasr commented 5 years ago

@mattfysh how did you bind that user to the role? Any help is appreciated.

mattfysh commented 5 years ago

@aeneasr using an account that allows you to create cluster role bindings:

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: client-binding
subjects:
- kind: User
  name: client
roleRef:
  kind: ClusterRole
  name: "cluster-admin"
  apiGroup: rbac.authorization.k8s.io
johnroach commented 5 years ago

@mattfysh I don't think this is the same problem. The issue I was facing was that the context definition is not pulling or using the correct user per kubeconfig setting. It just uses what is set in the session. It doesn't actively lookup the configuration.

ctavan commented 5 years ago

@johnroach did you ever find a fix or workaround to this one other than setting selecting the correct kubectl context before running terraform apply?

johnroach commented 5 years ago

Sadly no, I simply added a script to check diff and switch context depending on what file was changed. However, since then I have moved to another job where we don't use GKE. So there might be an answer for this now.

ctavan commented 5 years ago

Thanks for the reply! I figured out that the particular problem I was having might have been something else: apparently the credentials of my contexts were outdated. After running

gcloud container clusters get-credentials

again for my clusters terraform was again picking up the correct context irrespective of the current default context.

So I believe that as @pdecat noted the problem is really that terraform picks the default context instead of erroring out if the configured context is not available or broken.

landorg commented 4 years ago

I am facing that problem. I'm using it with our rke rancher cluster. Do you know of a way to verify my kubectl config? Manually selecting the correct context works as well as kubectl cli.

johnroach commented 4 years ago

@rolandg you could use a solution like kubectx to verify context. At this point since we have defined workarounds, I wonder if this ticket should be closed. @alexsomesan should this context switching and definition be a part of the provider?

dvcrn commented 4 years ago

I am running terraform through terraform cloud and started running into the same issues since moving my setup. Did you guys find a good solution for this?

I tried setting up a new configuration for a brand new cluster and allowed terraform to do anything from setting up the cluster on digitalocean, to configuring the provider but the problem still persists.

dvcrn commented 4 years ago

Okay, I followed some of the advice here to create the binding and now it's working:

kubectl create clusterrolebinding cluster-system-anonymous --clusterrole=cluster-admin --user=system:anonymous

But this is still just a temporary solution because now I can't use terraform to re-build the entire cluster from scratch since a manual command is required.

I wonder if terraform can create the cluster role binding itself without having the cluster role binding setup beforehand. Gonna give that a try.

rsalmond commented 4 years ago

I encountered this issue a while ago and was originally surprised to find such a glaring problem untouched for so long. I no longer believe this is a bug but rather an underdocumented feature working correctly.

If you take a look in a kubeconfig file you'll find, among other things, three yaml maps. One of clusters, one of users, and one of contexts (user/cluster pairs). The kubernetes terraform provider allows you to select arbitrary elements from each of these maps.

kubectl_context_auth_info allows you to specify an element from the users map, config_context_cluster from the clusters map, and config_context from the contexts map. If a context is not specified, but a cluster is, then the user in the user/cluster pair of the active context is used.

tl;dr - I think folks in this thread probably just need to change config_context_cluster to config_context in their provider config and their problems will go away.

I also think the example in the docs ought to be changed as config_context is, in my opinion, much more likely to be what users are looking for.

dak1n1 commented 4 years ago

@rsalmond is correct. We'll take a look at the PR and update the documentation.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!