Open pranavnateri opened 1 year ago
Please suggest any workaround atleast if there is no solution as of yet
Regards, Pranav
Thanks for opening this @pranavnateri. Refreshing the state can take some time if you have many helm_release
resources. Have you tried using Resource targeting (terraform apply -target ...
) to target only the helm release you are actually changing?
hi @jrhouston Thats exactly what i mentioned in the steps to reproduce section. I am doing the target apply which i do not want to. Also, The issue is same with or without -refresh=false while applying the terraform. Increasing the timeout value of the terraform helm provider resource to 600 is also not helping.
So taking too much time when multiple helm releases are there doesnt help because it will timeout after a certain while which is a bug.
Please suggest if there are any alternatives atleast until its fixed(PS: im already doing target apply because of this issue)
Regards, Pranav
hi @jrhouston ... can you please update on this? It is getting timedout even for 10 helm releases :(
why this is an issue? was this not tested?
Regards, Pranav
@jrhouston ??
@jrhouston @pranavnateri are there any updates on this? I am hitting the same issue.
Yes.. i still have the issue and no one is replying. there is no proper support @oscardalmau-r3
I also noticed that this has changed at some version. It used to be fast. I can also see that until 30-40 seconds it is not even creating a namespace. Hard to say what is causing it - it is also not installing any CRD and the kubernetes cluster is in local network (AWS EKS). Currently using version 2.10.1.
I did some testing: Kubernetes version 1.26 Terraform version v1.5.2
How I called the resourse:
resource "helm_release" "<redacted>" {
name = <redacted>
chart = "localpathtochart"
namespace = <redacted>
create_namespace = true
values = [
templatefile("${path.module}/values.yaml", {
<redacted>
})
]
}
The helm_release uses a helm chart that is in the local filesystem (no chart registry download). Also the package does not have any CRD installation. The chart does have chart dependencies.
The time described for resources appearing is the resource creation message time shown. For example:
module.test.helm_release.<redacted>: Still creating... [20s elapsed]
So it does not include any TF planning or provider download/init time.
The time shown as total is the total time for terraform(init, apply, destroy...). It had a few AWS resources(took 1-2 seconds) and time needed to download the providers.
Provider 2.1.2 took ~20s for namespace+pods to appear. (total 153s apply+destroy) Provider 2.2.0 took ~20s for namespace+pods to appear. (total 159s apply+destroy) Provider 2.4.1 took ~20s for namespace+pods to appear. (total 163s apply+destroy) Provider 2.5.0 took ~2MINUTES for namespace+pods to appear. (total 358s apply+destroy) (did several attempts) Provider 2.5.1 took ~2MINUTES for namespace+pods to appear. (total 388s apply+destroy) (did several attempts) Provider 2.6.0 took ~2MINUTES for namespace+pods to appear. (total 343s apply+destroy) (did several attempts) Did not bother to check when resources are starting to appear, just posting total time of apply+destroy. Provider 2.7.1 total 341s apply+destroy Provider 2.8.0 total 339s apply+destroy Provider 2.10.1 total 343s apply+destroy Provider 2.11.0 total 341s apply+destroy
Since version 2.5.0 it went from 20s to 2minutes for any resources to appear in kubernetes. The issue is also with destroy, that for couple of minutes it does nothing since 2.5.0!
As a workaround I am now using 2.4.1. If you have upgraded to higher version and can't delete the resource then I am not sure how to downgrade it.
I hope finding the version where it broke helps to locate the issue. For sure there is some sort of problem with this.
Massive +1 for this issue with more background why reverting to 2.4.1 is painful or impossible: bundled Helm client is ancient there and makes modern charts incompatible (e.g. Traefik requires Helm > 3.9.0 since November 2022, yet so old Helm seems not to work with modern k8s version for some reason). I use just a couple of charts, but my cluster is very remote, so the slowness makes it time out very often.
I'd be happy to help with the issue, but the size of diff between 2.4.1 and 2.5.0 is so large I have no idea where to start. If needed, I can try to tweak my deployment, so I get rid of everything that requires newer Helm and verify the problem lies with this particular version.
For the moment, I can confirm that disable_openapi_validation=false
does not solve the issue as I thought after debugging other issue (ref: https://github.com/hashicorp/terraform-provider-helm/issues/513).
BTW, Traefik Helm chart v24.0.0 seems to be a good test candidate as it creates a large amount of CRDs.
+1
@pranavnateri have you seen/tried the option exposed in this PR? It specifically mentions slowness due to excessive CRDs from Crossplane, but may be worth trying even if it's not your specific issue. I'll be trying this option out as well.
@pranavnateri have you seen/tried the option exposed in this PR? It specifically mentions slowness due to excessive CRDs from Crossplane, but may be worth trying even if it's not your specific issue. I'll be trying this option out as well.
Thank you! This solved it for us. We do have crospslane and many other CRD-s. Tested with provider 2.11.0.
provider "helm" {
burst_limit = 300
kubernetes {
...
}
}
It worked for me too! burst_limit
set to 900 for remote cluster, with Traefik as major user of CRDs.
That default value 100 from Helm itself seems comically low for any real deployment to me. However, it looks like the Terraform provider cannot interpret valid throttling messages from the Helm library and crashes instead of providing useful information (or attempting to retry the operation).
+1 for this issue, i m getting timeouts for only 2 helm releases being created the first time.
│ Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout or context cancellation while reading body)
+1 in my case it seems to be quite random as behaviour, sometimes it takes just a few seconds, others minutes.
this seemed to be caused by large number of CRDs on the server.
I notice that kubectl get --raw /openapi/v3
takes over 30 seconds to respond(and this behavior is inconsistent), which causes helm_release (which seems to be configured with static 30 second client timeout) to fail, with an error like:
╷
│ Error: unable to build kubernetes object for pre-delete hook kyverno/templates/hooks/pre-delete.yaml: error validating "": error validating data: the server was unable to return a response in the time allotted, but may still be processing the request
│
│
╵
We are running into this issue in vcluster, so I am linking it to this issue: https://github.com/loft-sh/vcluster/issues/1588
A configurable timeout parameter would be nice to address issues when /openapi/v3 endpoint takes > 30 seconds to respond: https://github.com/hashicorp/terraform-provider-helm/issues/463
Edit: unfortunately, this seems to not be possible in base helm library: https://github.com/helm/helm/issues/9805
My current workaround for this issue is to use terragrunt as a wrapper, and use the auto-retry feature: https://terragrunt.gruntwork.io/docs/features/auto-retry/, usually on the 2nd pass, the /openapi/v3 endpoint is fast to respond
This is becoming more of a major issue even with less than 80 CRDs in the cluster and setting burst_limit
doesn't help here. Any workarounds?
Terraform, Provider, Kubernetes and Helm Versions
Affected Resource(s)
Terraform Configuration Files
Below for example. i have posted a few.. but put around 40 helm releases and check. A single variable change in tfvars to apply, the provider just takes its own time to refresh, eventually to just timeout.
Debug Output
NOTE: In addition to Terraform debugging, please set HELM_DEBUG=1 to enable debugging info from helm.
Panic Output
Steps to Reproduce
Actual Behavior
Important Factoids
NO
References
Community Note