terraform apply crashed with "Kubernetes cluster unreachable" error.

avnerv commented 2 years ago

Terraform, Provider, Kubernetes and Helm Versions

Terraform version: 1.0.9
Provider version: 2.4.1
Kubernetes version: 1.21

Affected Resource(s)

helm_release

Terraform Configuration Files

provider "helm" {
  alias = "helm_hamc"
  kubernetes {
    host                   = data.aws_eks_cluster.eks_cluster.endpoint
    token                  = data.aws_eks_cluster_auth.eks_cluster_auth.token
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster.certificate_authority[0].data)
  }
}

resource "helm_release" "test" {
  provider         = helm.helm_hamc
  name             = "test"
  create_namespace = false
  namespace        = kubernetes_namespace.test.metadata[0].name
  chart            = "${path.module}/helm/chart/ftest-${var.chart_version}.tgz"
  values = [
    templatefile("${path.module}/values.yaml",
      {
        environment = var.customer
        region      = var.region
    })
  ]
}

Steps to Reproduce

terraform apply

Expected Behavior

terraform apply to completed successfully

Actual Behavior

When I run terraform apply, as part of that, some aws resources are installed before the helm_release is starting (there are dependencies) after 30 minutes of running the helm release starts and get the following error: Kubernetes cluster unreachable: the server has asked for the client to provide credentials With no change, on the second attempt, the terraform apply is completed with no error.

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

arybolovlev commented 2 years ago

Hi @avnerv,

If I understands you right, you spin up a new Kubernetes cluster and apply a helm chart to it within the same TF code. If that so, then the observed behaviour is expected. In short, the root cause here is how Terraform initializes providers. It does it all in one shot at the very beginning. In this case, first run of the code does not have a valid Kubernetes configuration for the Helm provider and it fails. Once you run it second time, the cluster is up and running and Terraform can fetch configuration for the provider and apply will be successful.

In this case, you either need to separate cluster management and resource provisioning, or use Terraform -target option to first spin up a cluster, and then apply resources.

This behaviour has mentioned in the Kubernetes provider's documentation here.

I hope that helps.

Thank you.

avnerv commented 2 years ago

Hi @avnerv,

If I understands you right, you spin up a new Kubernetes cluster and apply a helm chart to it within the same TF code. If that so, then the observed behaviour is expected. In short, the root cause here is how Terraform initializes providers. It does it all in one shot at the very beginning. In this case, first run of the code does not have a valid Kubernetes configuration for the Helm provider and it fails. Once you run it second time, the cluster is up and running and Terraform can fetch configuration for the provider and apply will be successful.

In this case, you either need to separate cluster management and resource provisioning, or use Terraform -target option to first spin up a cluster, and then apply resources.

This behaviour has mentioned in the Kubernetes provider's documentation here.

I hope that helps.

Thank you.

Thanks for the answer, but this is not the case. I created the EKS cluster on another tf state that is not related to the deployment of the helm chart. For example - I have two tf states, one for deploying all the infra resources (VPC, EKS, SG, IAM, etc.) and one for deploying my apps (helm chart). So back to the original issue - when I run terraform apply it' creates 80% of the related resources (namespaces, MongoDB, Redis. etc), and then after 30 minutes that the apply is running I receive the error message on deploying the helm chart...

arybolovlev commented 2 years ago

Thank you for the clarification. Then it looks like the token is getting expired due to its short-lived. In this case, you can try exec plugin. Please, use api_version: client.authentication.k8s.io/v1beta1.

I hope that helps.

Thank you.

domeales-paloit commented 2 years ago

I got the same error, but I am already using the exec plugin with aws eks get-token.

My helm_release failed mid way, leaving the release as failed and making it difficult to retry a terraform apply since I am now getting the error Error: cannot re-use a name that is still in use.

I feel like there could be an issue with the provider.

Does the exec plugin refresh the token for every request to the cluster endpoint?

jksjaz commented 1 year ago

I am facing a similar issue. It was working fine 6-7 days ago but suddenly stopped working and throwing this error since yesterday that the Cluster is unreachable.

I also tried completely removing the helm release but it still fails. The cluster is available and accessible but it's not able to connect.

Does this have to do something with the Kubernetes provider getting a recent release 2 days ago? link

AhmedMagdiShera commented 1 year ago

I am facing a similar issue with helm provider version 2.9.0 with aws exec / cluster_ca_certificate and token approach but it works with kubeconfig way that not a best practices.

note: the same token and exec works fine with another providers like kubernetes with kubectl manifests resources.

pliniogsnascimento commented 1 year ago

Hey all! Just passed through this issue, with a little different log actually, and learned a problem that might be new and some of you might be experiencing while using EKS and exec-plugin. I'll be passing here since it's an already open issue about the subject, in case anyone needs.

Versions

Terraform v1.5.6
AWS Provider v3.76.1
Helm Provider v2.11.0

I was upgrading my aws lab which was previously working, source code available here, and got the following error:

2023-09-10T12:35:38.677-0300 [ERROR] vertex "helm_release.argocd" error: Kubernetes cluster unreachable: Get "https://3EED65CB939BE8F433B62C22D9F7E2B0.gr7.us-east-1.eks.amazonaws.com/version": getting credentials: decoding stdout: couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }

Then I compared the command that I was using in provider and the configuration exported through aws eks update-kubeconfig command:

users:
- name: arn:aws:eks:us-east-1:499237116720:cluster/gitops-eks
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - --region
      - us-east-1
      - eks
      - get-token
      - --cluster-name
      - gitops-eks
      - --output
      - json
      command: aws
      env: null
      interactiveMode: IfAvailable
      provideClusterInfo: false

I immediately noticed that awscli was adding --output json arg. I ran the command aws eks get-token with and without the output arg, here is output without it:

client.authentication.k8s.io/v1beta1    ExecCredential
STATUS  2023-09-10T17:19:05Z    <token>

And with it:

{
    "kind": "ExecCredential",
    "apiVersion": "client.authentication.k8s.io/v1beta1",
    "spec": {},
    "status": {
        "expirationTimestamp": "2023-09-10T17:19:11Z",
        "token": "<token>"
    }
}

The problem was solved to me by simply adding the output arg to my exec-plugin configuration. It was a bit confusing at first since I didn't found any issue about it and I was using the suggested code in documentation present here. It seems that AWS changed the default output get-token command at some point that I wasn't able to find it.

Maybe it would be interesting to add this arg on the documentation, so people don't face this problem while following the documentation:

provider "helm" {
  kubernetes {
    host                   = var.cluster_endpoint
    cluster_ca_certificate = base64decode(var.cluster_ca_cert)
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      args        = ["eks", "get-token", "--cluster-name", var.cluster_name, "--output", "json"]
      command     = "aws"
    }
  }
}

almirosmanovic commented 6 months ago

aws configure, set output to json

d3vpasha commented 3 months ago

Does someone has a viable solution for this problem ? I am still getting this error for the helm provider.

geek0ps commented 2 months ago

in my own case i resolved it by adding the principal/role initiating the command to the clusters access entry. you can make sure of this if you are using terraform by adding this argument enable_cluster_creator_admin_permissions = true to the eks creation module if you are working locally.

decipher27 commented 1 week ago

If your testing it locally and above stuffs aren't working.. Then, set your k8s context - make sure you are on the right cluster and add the below to provider.tf

provider "kubernetes" {
  config_path = "~/.kube/config"
}

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

This worked for me

hashicorp / terraform-provider-helm