Closed utx0 closed 2 years ago
Further more. A deploying I have running right now is showing me this:
❯ helm ls -a
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
mongodb-sharded default 1 2020-04-16 03:18:01.305438381 +0000 UTC pending-install mongodb-sharded-1.1.2 4.2.5
And also this:
❯ kubectl get pods
NAME READY STATUS RESTARTS AGE
mongodb-sharded-configsvr-0 1/1 Running 0 3m38s
mongodb-sharded-mongos-6c8fb46c44-nml8b 0/1 Running 1 3m38s
mongodb-sharded-shard0-data-0 0/1 Running 0 3m38s
mongodb-sharded-shard1-data-0 0/1 Running 0 3m38s
Experiencing this with the prometheus operator (as does this issue potentially https://github.com/helm/charts/issues/21913).
Same symptoms as above. Everything in kubectl is running, helm chart stuck on 'pending-install'. As soon as terraform times out, helm chart goes to 'failed'. Installing with helm manually works no problem.
One thing that changed the behavior for me was to up the size of the VM for the node pools then it worked without issue. However, for the life of me, I can't find any outputted loges anywhere to help debug what is actually happening...
I'm trying to reproduce this one, so far this is working for me:
resource "helm_release" "example" {
name = "example"
repository = "https://kubernetes-charts.storage.googleapis.com"
chart = "prometheus-operator"
}
Is there more info you can share about your environments @lukekhamilton @arlyon ? A full tf config that reproduces this issue would be super helpful for me. We have test accounts on all the major cloud providers and even a bare metal cluster we can run on.
@jrhouston Given enough tries, it goes through fine, but it is quite inconsistent. I have in response to this problem split my terraform configs into two separate modules with isolated states (one for monitoring / plumbing and one for 'apps') for now.
You can find an example here: https://github.com/arlyon/infra-code (pre split). Note that this has expired cloudflare keys populated in some of the configs, but I don't think it'll cause problems.
I'm experiencing a similar issue, but with Bitnami's nginx, as follows:
# Using helm provider ~> 1.1.1
data "helm_repository" "bitnami" {
name = "bitnami"
url = "https://charts.bitnami.com/bitnami"
}
resource "helm_release" "nginx" {
name = "my-nginx"
repository = data.helm_repository.bitnami.metadata[0].name
chart = "nginx"
version = "5.2.3"
namespace = "default"
timeout = 50
}
It keeps printing lines such as: helm_release.nginx: Still creating... [50s elapsed]
until hitting the timeout.
Once it hits the timeout, is when the deployment entry appears on helm:
$ helm ls --namespace my-namespace
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
my-nginx default 1 2020-04-22 16:16:25.7100555 +0100 BST failed nginx-5.2.3 1.17.10
marked as failed
.
However, the pods are up, ready and running, and the services have been created and work.
I'm sharing it here because I believe it may be related (seems to relate with the perception of pod readiness) and it's also a relatively quick scenario to spin up/down as necessary.
Also, a workaround for this use case is to add wait = false
to the helm_release
.
We're going to work on this with low priority as we collect more data since the issue is hard to reproduce on demand.
I have a similar problem on GKE
My helm_release has a high timeout of 3000, and sometimes its fails with message:
Kubernetes Cluster Unreachable
BUT most of the times that helm_release would be deployed in the backend.
I've had the issue with kubedb, both on AKS as on a bare-metal cluster. They have this as an open issue: https://github.com/kubedb/project/issues/504
I'm also experiencing this, deploying the Gloo helm chart via terraform. In my case, I have a tf-modules repo that I import.. Gloo deploys just fine, but terraform keeps waiting for it to finish and eventually times out, though the deployment ended minutes ago...
I ran into this issue and I solved this by deleting jobs that can't be "recreated" like Kubernetes Jobs which were created as part of a helm deployment. Then the Terraform Helm module worked as expected for my use case.
I'm experiencing this with AWS EKS and the ingress-nginx
helm chart. It installs without issue and Terraform considers the install successful, however Terraform reinstalls it because the chart state is not updated.
Success in the create log:
helm_release.ingress-nginx: Creation complete after 5m15s [id=ingress-nginx]
Basic reproduction:
resource "helm_release" "ingress-nginx" {
name = "ingress-nginx"
chart = "ingress-nginx"
repository = "https://kubernetes.github.io/ingress-nginx"
version = "3.23.0"
namespace = kubernetes_namespace.ingress-nginx.metadata.0.name
values = [
file("charts/ingress-nginx-values.yaml")
]
}
values.yaml:
controller:
service:
httpPort:
targetPort: "http"
httpsPort:
targetPort: "https"
annotations:
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
This doesn't happen with the cert-manager chart however.
I'm also experiencing this issue on EKS installing consul. Pods come up and elect a leader but helm release keeps pending until timeout, at which point it is marked as failed. A few days ago provisioning was working fine and today this issue started to happen. Installing manually via helm install ...
works.
Terraform v0.15.2
on darwin_amd64
+ provider registry.terraform.io/hashicorp/aws v3.38.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/external v2.1.0
+ provider registry.terraform.io/hashicorp/helm v2.1.2
+ provider registry.terraform.io/hashicorp/http v2.1.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.1.0
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/template v2.2.0
I am experiencing this with newrelic bundle for eks. The structure is exactly the same as the cli helm args but it is not deploying properly.
kubectl create namespace monitoring ; helm install newrelic-bundle newrelic/nri-bundle \
--set global.licenseKey=********************************* \
--set global.cluster=********* \
--namespace=monitoring \
--set newrelic-infrastructure.privileged=true \
--set ksm.enabled=true \
--set prometheus.enabled=true \
--set kubeEvents.enabled=true \
--set logging.enabled=true
vs
resource "helm_release" "newrelic-bundle" {
name = "newrelic-bundle"
repository = "https://helm-charts.newrelic.com"
chart = "nri-bundle"
namespace = "monitoring"
set {
name = "global.licenseKey"
value = var.new_relic_license_key
type = "string"
}
set {
name = "clusterName"
value = "*******************"
type = "string"
}
set {
name = "newrelic-infrastructure.privileged"
value = true
type = "auto"
}
set {
name = "ksm.enabled"
value = true
type = "auto"
}
set {
name = "prometheus.enabled"
value = true
type = "auto"
}
set {
name = "kubeEvents.enabled"
value = true
type = "auto"
}
set {
name = "logging.enabled"
value = true
type = "auto"
}
depends_on = [
kubernetes_namespace.monitoring
]
}
Experience the same issue with self-host minikube K8S. I am not sure if my K8S version is to old for Helm v3 but it worked well with Helm cli and failed with Terraform Helm provider.
The version of my environment:
K8S: v1.10.0
Helm: v3.6.3
Terraform: v1.0.3
Terraform Helm provider: 2.2.0
Bitnami mysql Helm chart: 4.5.2
Below is my configuration:
terraform {
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
version = "= 2.3.2"
}
helm = {
source = "hashicorp/helm"
version = "2.2.0"
}
}
}
resource "helm_release" "mysql" {
name = "mysql"
repository = "https://charts.bitnami.com/bitnami"
chart = "mysql"
version = "4.5.2"
namespace = kubernetes_namespace.mysql.metadata.0.name
values = [
file("helm-release/mysql/values.yml")
]
}
Also experiencing this problem on AWS EKS with terragrunt
EDIT : installing fluent-bit chart https://fluent.github.io/helm-charts chart version 0.10.0
I see this issue when I set a namespace on the helm_release resource.
Helm provider v2.4.1
I experienced this now with hashicorp vault chart. Vault gets installed, and pods are Running (though in error state, as expected because vault is initially sealed). The helm provider times out in 5 minutes even though the helm cli run manually completes without an error after 10-20 seconds.
export TF_LOG=trace
showed me hints.
The Terraform helm provider appears to wait for Kubernetes to process everything in the chart, while the cli helm does not. In my case a load balancer was failing to start.
The same goes to ingress-nginx and filebeat for me too.
Happening for me with ExternalDNS. The Warning / error message says to use the helm
command to investigate. What does this even mean? How do we use the helm command to investigate?
Warning: Helm release "external-dns" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.
│
│ with helm_release.external_dns,
│ on 07-helm-external-dns.tf line 1, in resource "helm_release" "external_dns":
│ 1: resource "helm_release" "external_dns" {
│
╵
Is there a way to see more deeply what's happening behind the scenes? instead of just seeing the error timed out message?
resource "helm_release" "external_dns" {
name = "external-dns"
repository = "https://kubernetes-sigs.github.io/external-dns/"
chart = "external-dns"
namespace = "external-dns"
#create_namespace = true
#timeout = 1000
#atomic = true
set {
name = "provider"
value = "azure"
}
set {
name = "txtPrefix"
value = "externaldns-"
}
#depends_on = [
# kubernetes_secret.azure_config_file,
#]
}
@blue928 You see that error when the chart has landed in a failed status – it means that the release was partially deployed but one of the resources has failed for some reason. If you do helm ls --all
you'll see a failed status next to the release, and you should be able to use helm status
to see what resources are in the chart and further investigate with kubectl
to see what's failing.
The helm provider doesn't do any managing of the resources inside the chart, it just sees that helm has returned a failed status for the release and bubbles it up.
We are closing this issue as this is something that has become a catch-all for a multitude of different issues. We ask that if you run into further related problems, please open a new issue and we will review it.
Adding comment in case if this might help others.
I faced this issue when installing bitnami/nginx
helm chart.
Output:
NAME: nginx
LAST DEPLOYED: Mon Sep 26 21:57:24 2022
NAMESPACE: default
STATUS: pending-install
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: nginx
CHART VERSION: 13.1.6
APP VERSION: 1.23.1
Observed that the service created with default helm values is Loadbalancer. So the service was stuck in pending state and guessing it caused the failure in terraform.
Please try kubectl get all
to identify if any other resources might be in pending state.
Overriding it to ClusterIP fixed the issue.
Posting here hoping that this will help others facing similar issues.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Community Note
Terraform Version and Provider Version
Provider Version
Affected Resource(s)
I am seeing this behavior across a few charts, but its a bit random.
Terraform Configuration Files
resource in question;
Debug Output
https://gist.github.com/lukekhamilton/8e52b1e403a89557062796a4d25af24d
Panic Output
Expected Behavior
When I run the
terraform apply
I expect it to install the helm chart.Actual Behavior
When I run the
terraform apply
this helm chart and others arent installing and getting stuck. However when I delete the release and then install manually it works without issue.Steps to Reproduce
terraform apply
Important Factoids
I am running this on a very standard AKS clusters cluster.
References