hashicorp / terraform-provider-helm

Terraform Helm provider
https://www.terraform.io/docs/providers/helm/
Mozilla Public License 2.0
997 stars 369 forks source link

helm charts failing deployment via terraform, working when direct vai helm cli #467

Closed utx0 closed 2 years ago

utx0 commented 4 years ago

Community Note

Terraform Version and Provider Version

❯ terraform version 
Terraform v0.12.24
+ provider.azurerm v2.5.0
+ provider.null v2.1.2
+ provider.random v2.2.1
+ provider.tls v2.1.1

Provider Version

Affected Resource(s)

I am seeing this behavior across a few charts, but its a bit random.

Terraform Configuration Files

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.

resource in question;

resource "helm_release" "mongodb-sharded" {
  name      = "mongodb-sharded"
  chart     = "mongodb-sharded"
  repository = "https://charts.bitnami.com/bitnami"
  timeout = 600
}

Debug Output

https://gist.github.com/lukekhamilton/8e52b1e403a89557062796a4d25af24d

Panic Output

Expected Behavior

When I run the terraform apply I expect it to install the helm chart.

Actual Behavior

When I run the terraform apply this helm chart and others arent installing and getting stuck. However when I delete the release and then install manually it works without issue.

❯ helm ls -a   
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
mongodb-sharded default         1               2020-04-16 02:46:09.166067202 +0000 UTC pending-install mongodb-sharded-1.1.2   4.2.5   

Steps to Reproduce

  1. terraform apply

Important Factoids

I am running this on a very standard AKS clusters cluster.

References

utx0 commented 4 years ago

Further more. A deploying I have running right now is showing me this:

❯ helm ls -a
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
mongodb-sharded default         1               2020-04-16 03:18:01.305438381 +0000 UTC pending-install mongodb-sharded-1.1.2   4.2.5   

And also this:

❯ kubectl get pods
NAME                                           READY   STATUS    RESTARTS   AGE
mongodb-sharded-configsvr-0                    1/1     Running   0          3m38s
mongodb-sharded-mongos-6c8fb46c44-nml8b        0/1     Running   1          3m38s
mongodb-sharded-shard0-data-0                  0/1     Running   0          3m38s
mongodb-sharded-shard1-data-0                  0/1     Running   0          3m38s
arlyon commented 4 years ago

Experiencing this with the prometheus operator (as does this issue potentially https://github.com/helm/charts/issues/21913).

Same symptoms as above. Everything in kubectl is running, helm chart stuck on 'pending-install'. As soon as terraform times out, helm chart goes to 'failed'. Installing with helm manually works no problem.

utx0 commented 4 years ago

One thing that changed the behavior for me was to up the size of the VM for the node pools then it worked without issue. However, for the life of me, I can't find any outputted loges anywhere to help debug what is actually happening...

jrhouston commented 4 years ago

I'm trying to reproduce this one, so far this is working for me:

resource "helm_release" "example" {
  name       = "example"
  repository = "https://kubernetes-charts.storage.googleapis.com"
  chart      = "prometheus-operator"
}

Is there more info you can share about your environments @lukekhamilton @arlyon ? A full tf config that reproduces this issue would be super helpful for me. We have test accounts on all the major cloud providers and even a bare metal cluster we can run on.

arlyon commented 4 years ago

@jrhouston Given enough tries, it goes through fine, but it is quite inconsistent. I have in response to this problem split my terraform configs into two separate modules with isolated states (one for monitoring / plumbing and one for 'apps') for now.

You can find an example here: https://github.com/arlyon/infra-code (pre split). Note that this has expired cloudflare keys populated in some of the configs, but I don't think it'll cause problems.

lmserrano commented 4 years ago

I'm experiencing a similar issue, but with Bitnami's nginx, as follows:

# Using helm provider ~> 1.1.1

data "helm_repository" "bitnami" {
  name = "bitnami"
  url  = "https://charts.bitnami.com/bitnami"
}

resource "helm_release" "nginx" {
  name       = "my-nginx"
  repository = data.helm_repository.bitnami.metadata[0].name
  chart      = "nginx"
  version    = "5.2.3"

  namespace = "default"

  timeout = 50
}

It keeps printing lines such as: helm_release.nginx: Still creating... [50s elapsed] until hitting the timeout.

Once it hits the timeout, is when the deployment entry appears on helm:

$ helm ls --namespace my-namespace
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS  CHART           APP VERSION
my-nginx        default   1               2020-04-22 16:16:25.7100555 +0100 BST   failed  nginx-5.2.3     1.17.10

marked as failed.

However, the pods are up, ready and running, and the services have been created and work.

I'm sharing it here because I believe it may be related (seems to relate with the perception of pod readiness) and it's also a relatively quick scenario to spin up/down as necessary.

Also, a workaround for this use case is to add wait = false to the helm_release.

alexsomesan commented 4 years ago

We're going to work on this with low priority as we collect more data since the issue is hard to reproduce on demand.

srinathganesh1 commented 4 years ago

I have a similar problem on GKE

My helm_release has a high timeout of 3000, and sometimes its fails with message: Kubernetes Cluster Unreachable BUT most of the times that helm_release would be deployed in the backend.

carlowouters commented 4 years ago

I've had the issue with kubedb, both on AKS as on a bare-metal cluster. They have this as an open issue: https://github.com/kubedb/project/issues/504

CelsoSantos commented 4 years ago

I'm also experiencing this, deploying the Gloo helm chart via terraform. In my case, I have a tf-modules repo that I import.. Gloo deploys just fine, but terraform keeps waiting for it to finish and eventually times out, though the deployment ended minutes ago...

shicholas commented 3 years ago

I ran into this issue and I solved this by deleting jobs that can't be "recreated" like Kubernetes Jobs which were created as part of a helm deployment. Then the Terraform Helm module worked as expected for my use case.

Routhinator commented 3 years ago

I'm experiencing this with AWS EKS and the ingress-nginx helm chart. It installs without issue and Terraform considers the install successful, however Terraform reinstalls it because the chart state is not updated.

Success in the create log:

helm_release.ingress-nginx: Creation complete after 5m15s [id=ingress-nginx]

Basic reproduction:

resource "helm_release" "ingress-nginx" {
  name       = "ingress-nginx"
  chart      = "ingress-nginx"
  repository = "https://kubernetes.github.io/ingress-nginx"
  version    = "3.23.0"
  namespace  = kubernetes_namespace.ingress-nginx.metadata.0.name

  values = [
    file("charts/ingress-nginx-values.yaml")
  ]
}

values.yaml:

controller:
  service:
    httpPort:
      targetPort: "http"
    httpsPort:
      targetPort: "https"
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-type: "nlb"

This doesn't happen with the cert-manager chart however.

miguelaferreira commented 3 years ago

I'm also experiencing this issue on EKS installing consul. Pods come up and elect a leader but helm release keeps pending until timeout, at which point it is marked as failed. A few days ago provisioning was working fine and today this issue started to happen. Installing manually via helm install ... works.

Terraform v0.15.2
on darwin_amd64
+ provider registry.terraform.io/hashicorp/aws v3.38.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/external v2.1.0
+ provider registry.terraform.io/hashicorp/helm v2.1.2
+ provider registry.terraform.io/hashicorp/http v2.1.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.1.0
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/template v2.2.0
SpootyMcSpoot commented 3 years ago

I am experiencing this with newrelic bundle for eks. The structure is exactly the same as the cli helm args but it is not deploying properly.


kubectl create namespace monitoring ; helm install newrelic-bundle newrelic/nri-bundle \
 --set global.licenseKey=********************************* \
 --set global.cluster=********* \
 --namespace=monitoring \
 --set newrelic-infrastructure.privileged=true \
 --set ksm.enabled=true \
 --set prometheus.enabled=true \
 --set kubeEvents.enabled=true \
 --set logging.enabled=true

vs

resource "helm_release" "newrelic-bundle" {
  name       = "newrelic-bundle"
  repository = "https://helm-charts.newrelic.com" 
  chart      = "nri-bundle"
  namespace = "monitoring"

  set {
    name  = "global.licenseKey"
    value = var.new_relic_license_key
    type  = "string"
  }

  set {
    name  = "clusterName"
    value = "*******************"
    type  = "string"
  }

  set {
    name  = "newrelic-infrastructure.privileged"
    value = true
    type  = "auto"
  }

  set {
    name  = "ksm.enabled"
    value = true
    type  = "auto"
  }

  set {
    name  = "prometheus.enabled"
    value = true
    type  = "auto"
  }

  set {
    name  = "kubeEvents.enabled"
    value = true
    type  = "auto"
  }

  set {
    name  = "logging.enabled"
    value = true
    type  = "auto"
  }

  depends_on = [
    kubernetes_namespace.monitoring
  ]
}
moonape1226 commented 3 years ago

Experience the same issue with self-host minikube K8S. I am not sure if my K8S version is to old for Helm v3 but it worked well with Helm cli and failed with Terraform Helm provider.

The version of my environment:

K8S: v1.10.0
Helm: v3.6.3
Terraform: v1.0.3
Terraform Helm provider: 2.2.0
Bitnami mysql Helm chart: 4.5.2

Below is my configuration:

terraform {
  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "= 2.3.2"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "2.2.0"
    }
  }
}

resource "helm_release" "mysql" {
  name       = "mysql"
  repository = "https://charts.bitnami.com/bitnami"
  chart      = "mysql"
  version    = "4.5.2"
  namespace  = kubernetes_namespace.mysql.metadata.0.name

  values = [
    file("helm-release/mysql/values.yml")
  ]
}
Startouf commented 3 years ago

Also experiencing this problem on AWS EKS with terragrunt

EDIT : installing fluent-bit chart https://fluent.github.io/helm-charts chart version 0.10.0

dreverri commented 3 years ago

I see this issue when I set a namespace on the helm_release resource.

jimsnab commented 2 years ago

Helm provider v2.4.1

I experienced this now with hashicorp vault chart. Vault gets installed, and pods are Running (though in error state, as expected because vault is initially sealed). The helm provider times out in 5 minutes even though the helm cli run manually completes without an error after 10-20 seconds.

export TF_LOG=trace showed me hints.

The Terraform helm provider appears to wait for Kubernetes to process everything in the chart, while the cli helm does not. In my case a load balancer was failing to start.

ivoros commented 2 years ago

The same goes to ingress-nginx and filebeat for me too.

blue928 commented 2 years ago

Happening for me with ExternalDNS. The Warning / error message says to use the helm command to investigate. What does this even mean? How do we use the helm command to investigate?

Warning: Helm release "external-dns" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.
│ 
│   with helm_release.external_dns,
│   on 07-helm-external-dns.tf line 1, in resource "helm_release" "external_dns":
│    1: resource "helm_release" "external_dns" {
│ 
╵

Is there a way to see more deeply what's happening behind the scenes? instead of just seeing the error timed out message?

resource "helm_release" "external_dns" {
  name       = "external-dns"
  repository = "https://kubernetes-sigs.github.io/external-dns/"
  chart      = "external-dns"

  namespace = "external-dns"
  #create_namespace = true
  #timeout = 1000
  #atomic  = true

  set {
      name = "provider"
      value = "azure"
  }

  set {
      name = "txtPrefix"
      value = "externaldns-"
  }

  #depends_on = [
   # kubernetes_secret.azure_config_file,
  #]
}
jrhouston commented 2 years ago

@blue928 You see that error when the chart has landed in a failed status – it means that the release was partially deployed but one of the resources has failed for some reason. If you do helm ls --all you'll see a failed status next to the release, and you should be able to use helm status to see what resources are in the chart and further investigate with kubectl to see what's failing.

The helm provider doesn't do any managing of the resources inside the chart, it just sees that helm has returned a failed status for the release and bubbles it up.

iBrandyJackson commented 2 years ago

We are closing this issue as this is something that has become a catch-all for a multitude of different issues. We ask that if you run into further related problems, please open a new issue and we will review it.

pradeepnnv commented 2 years ago

Adding comment in case if this might help others.

I faced this issue when installing bitnami/nginx helm chart.

Output:

NAME: nginx
LAST DEPLOYED: Mon Sep 26 21:57:24 2022
NAMESPACE: default
STATUS: pending-install
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: nginx
CHART VERSION: 13.1.6
APP VERSION: 1.23.1

Observed that the service created with default helm values is Loadbalancer. So the service was stuck in pending state and guessing it caused the failure in terraform.

Please try kubectl get all to identify if any other resources might be in pending state.

Overriding it to ClusterIP fixed the issue.

Posting here hoping that this will help others facing similar issues.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.