Open avolution opened 4 years ago
Does kubectl work? Can you create/update/delete resources that way?
"Does kubectl work? Can you create/update/delete resources that way?"
Yes, this is working
Here is the output of helm --debug with more details
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR helm.go:84: [debug] stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR list: failed to list helm.sh/helm/v3/pkg/storage/driver.(Secrets).List /private/tmp/helm--615sa8/src/helm.sh/helm/pkg/storage/driver/secrets.go:87 helm.sh/helm/v3/pkg/action.(List).Run /private/tmp/helm-/src/helm.sh/helm/pkg/action/list.go:154 main.newListCmd.func1 /private/tmp/helm-/src/helm.sh/helm/cmd/helm/list.go:80 github.com/spf13/cobra.(Command).execute /private/tmp/helm-/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 github.com/spf13/cobra.(Command).ExecuteC /private/tmp/helm--/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 github.com/spf13/cobra.(Command).Execute /private/tmp/helm--/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887 main.main /private/tmp/helm-/src/helm.sh/helm/cmd/helm/helm.go:83 runtime.main /usr/local/Cellar/go@1.13/1.13.10_1/libexec/src/runtime/proc.go:203 runtime.goexit /usr/local/Cellar/go@1.13/1.13.10_1/libexec/src/runtime/asm_amd64.s:1357
The network traffic on that process is on around 30Kb per second until it fails
Are you using the default namespace or a separate one for the Helm deployment?. I faced the same issue with same error log and I recreated the namespace when I observe that helm commands work with other namespaces. It resolved the connectivity issue. Unfortunately I did not dig further to identify the root cause.
I recently saw this behavior on a Kubernetes cluster where one of the Kubernetes API server proxies was misbehaving, causing Helm to wait for long periods of time before finally giving up with a network connection. I could replicate it using kubectl
commands that required longer connection times.
Other tests you can try:
kubectl get...
query that uses a label selector, and see if you can reproduceThere is a very high probability that the problem here has to do with either a proxy or the Kubernetes API server itself.
@technosophos how did you fixed that?
Yea I also think that is a Problem on the Kubernetes Server Setup. I use Google Kubernetes Engine so I dont have that deep Insights into the logs of my cluster(or?) Is there a way to monitor that in GCP? Or also a way to "reset" the cluster API/Proxy Settings
I tried also to install something under another namespace. Got the same Error
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.
I have similar issue with GKE and helm
> helm ls
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR
any updates?
We have no updates on our side, as we don't believe this is a Helm error so much as a Kubernetes control plane error. Right now, all of the complaints that I know of are specific to GCP. You might have better luck asking someone there about the issue.
To my knowledge EKS has never had this problem. I experienced it on AKS a year ago, and it has since been fixed. I know of no cases involving on-prem versions of Kubernetes.
So at this point, we believe the error to be specific to GKE's internal control plane implementation.
@shay-berman Can you share the output of helm version
, kubectl version
and which OS you are running? I'm part of the Kubernetes team at Google and want to see if we can figure out what is going on here.
LIST operations that take more than 60 seconds hit the global timeout and are terminated by the server. The error message combined with the "network traffic [being] 30Kb per second until it fails" makes me suspect that is what is happening, with the likely cause being a slow internet connection between the user and the control plane. A prior commenter suggested running the command from a pod in the cluster, I would try that.
same thing here. We need, at least, a workaround. What's the best known one?
🜚 helm --debug list -A
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR
helm.go:94: [debug] stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR
list: failed to list
helm.sh/helm/v3/pkg/storage/driver.(*Secrets).List
/private/tmp/helm-20200923-64956-rldbbk/pkg/storage/driver/secrets.go:87
helm.sh/helm/v3/pkg/action.(*List).Run
/private/tmp/helm-20200923-64956-rldbbk/pkg/action/list.go:154
main.newListCmd.func1
/private/tmp/helm-20200923-64956-rldbbk/cmd/helm/list.go:79
github.com/spf13/cobra.(*Command).execute
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
github.com/spf13/cobra.(*Command).ExecuteC
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
github.com/spf13/cobra.(*Command).Execute
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main
/private/tmp/helm-20200923-64956-rldbbk/cmd/helm/helm.go:93
runtime.main
/usr/local/Cellar/go/1.15.2/libexec/src/runtime/proc.go:204
runtime.goexit
/usr/local/Cellar/go/1.15.2/libexec/src/runtime/asm_amd64.s:1374
🜚 helm version
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"dirty", GoVersion:"go1.15.2"}
🜚 kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T18:49:28Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-gke.801", GitCommit:"3a26ac58e2a1ce0170c304c4134149ce3526eb8a", GitTreeState:"clean", BuildDate:"2020-09-28T17:32:58Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
for the record, my cluster is on GCP.
We have no updates on our side, as we don't believe this is a Helm error so much as a Kubernetes control plane error. Right now, all of the complaints that I know of are specific to GCP. You might have better luck asking someone there about the issue.
To my knowledge EKS has never had this problem. I experienced it on AKS a year ago, and it has since been fixed. I know of no cases involving on-prem versions of Kubernetes.
So at this point, we believe the error to be specific to GKE's internal control plane implementation.
hey, I can confirm it happen to me on EKS, on multiple clusters, helm version is 3.3.0 when it happens after a few mins it just starts working without doing anything, it happened to me a few times in the last month. I must say it didn't look to me like this is a helm or an EKS problem but a WSL(windows subsystem linux) or VPN problem. I didnt try to debug this as this is gone after a few mins, but I will try to investigate next time it happens to me.
The quick workaround is to delete previous release versions. I had the same issue with prometheus-stack chart. so I list all secrets where helm3 saves data about releases. so I list secrets
kubectl get secrets --all-namespaces
found sh.helm.release.v1.kube-prometheus-stack.v7
and delete all previous version
kubectl delete secrets -n monitoring sh.helm.release.v1.kube-prometheus-stack.v1 ...
and helm ls
start to work
I have a similar problem, but helm ls
works as well as k get secret --all-namespaces
time helm install squid ./ --values values.yaml --timeout 15m0s --wait --v 6 --debug
install.go:172: [debug] Original chart version: ""
install.go:189: [debug] CHART PATH: /LOCAL_PATH_TO_CHART/squid
Error: create: failed to create: context deadline exceeded
helm.go:81: [debug] context deadline exceeded
create: failed to create
helm.sh/helm/v3/pkg/storage/driver.(*Secrets).Create
/private/tmp/helm-20201111-97167-dwh5s1/pkg/storage/driver/secrets.go:164
helm.sh/helm/v3/pkg/storage.(*Storage).Create
/private/tmp/helm-20201111-97167-dwh5s1/pkg/storage/storage.go:66
helm.sh/helm/v3/pkg/action.(*Install).Run
/private/tmp/helm-20201111-97167-dwh5s1/pkg/action/install.go:320
main.runInstall
/private/tmp/helm-20201111-97167-dwh5s1/cmd/helm/install.go:241
main.newInstallCmd.func2
/private/tmp/helm-20201111-97167-dwh5s1/cmd/helm/install.go:120
github.com/spf13/cobra.(*Command).execute
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
github.com/spf13/cobra.(*Command).ExecuteC
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
github.com/spf13/cobra.(*Command).Execute
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main
/private/tmp/helm-20201111-97167-dwh5s1/cmd/helm/helm.go:80
runtime.main
/usr/local/Cellar/go/1.15.4/libexec/src/runtime/proc.go:204
runtime.goexit
/usr/local/Cellar/go/1.15.4/libexec/src/runtime/asm_amd64.s:1374
real 1m6.193s
user 0m0.777s
sys 0m0.897s
time helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
real 0m5.349s
user 0m0.058s
sys 0m0.015s
time k get secret --all-namespaces -l "owner=helm" | wc -l
51
real 0m5.053s
user 0m0.130s
sys 0m0.051s
UPD: decreasing the number of secrets doesn't help.
k get secret --all-namespaces -l "owner=helm" | wc -l
24
helm uninstall squid ; helm install squid ./ --values values.yaml --timeout 15m0s --wait --v 6 --debug
Error: uninstall: Release not loaded: squid: release: not found
Error: create: failed to create: context deadline exceeded
helm.go:81: [debug] context deadline exceeded
create: failed to create
helm.sh/helm/v3/pkg/storage/driver.(*Secrets).Create
/private/tmp/helm-20201111-97167-dwh5s1/pkg/storage/driver/secrets.go:164
helm.sh/helm/v3/pkg/storage.(*Storage).Create
/private/tmp/helm-20201111-97167-dwh5s1/pkg/storage/storage.go:66
helm.sh/helm/v3/pkg/action.(*Install).Run
/private/tmp/helm-20201111-97167-dwh5s1/pkg/action/install.go:320
main.runInstall
/private/tmp/helm-20201111-97167-dwh5s1/cmd/helm/install.go:241
main.newInstallCmd.func2
/private/tmp/helm-20201111-97167-dwh5s1/cmd/helm/install.go:120
github.com/spf13/cobra.(*Command).execute
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
github.com/spf13/cobra.(*Command).ExecuteC
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
github.com/spf13/cobra.(*Command).Execute
/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main
/private/tmp/helm-20201111-97167-dwh5s1/cmd/helm/helm.go:80
runtime.main
/usr/local/Cellar/go/1.15.4/libexec/src/runtime/proc.go:204
runtime.goexit
/usr/local/Cellar/go/1.15.4/libexec/src/runtime/asm_amd64.s:1374
Depending on the encryption at rest implementation, listing all secrets could be a very expensive operation. It could be optimized, but no one has done so yet.
Also note that label selector queries don't make the request less resource intensive, it has to read (and decrypt!) every secret to see if it matches or not.
So for now, I strongly recommend confining secret lists to single namespaces.
I can confirm I see this on AWS EKS with about ~50 releases in a namespace and I use the command:
$ helm3 --kube-context XXX --namespace XXX list -a
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR
$ helm3 version
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}
EDIT:
I can confirm that this is a kube-apiserver issue; traces from the audit log in my cluster:
E0309 20:59:28.389140 1 wrap.go:32] apiserver panic'd on GET /api/v1/namespaces/XXX/secrets?labelSelector=owner%3Dhelm
I0309 20:59:28.389202 1 log.go:172] http2: panic serving 173.38.220.51:22316: killing connection/stream because serving request timed out and response had been started
goroutine 15138551 [running]:
k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).runHandler.func1(0xc014f272d0, 0xc01f15bfaf, 0xc06f5a8780)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:2118 +0x16b
panic(0x3bee6c0, 0xc000391590)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc01f15bce0, 0x1, 0x1)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105
panic(0x3bee6c0, 0xc000391590)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).timeout(0xc038f573e0, 0xc07c576780)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:256 +0x1a9
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc035fa4840, 0x76c6500, 0xc03bbf7340, 0xc0239ab400)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:140 +0x2d7
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1(0x76c6500, 0xc03bbf7340, 0xc0239ab300)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/waitgroup.go:47 +0xf3
net/http.HandlerFunc.ServeHTTP(0xc035f71950, 0x76c6500, 0xc03bbf7340, 0xc0239ab300)
/usr/local/go/src/net/http/server.go:1995 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x76c6500, 0xc03bbf7340, 0xc09b33b200)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x2b8
net/http.HandlerFunc.ServeHTTP(0xc035f71980, 0x76c6500, 0xc03bbf7340, 0xc09b33b200)
/usr/local/go/src/net/http/server.go:1995 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x76c6500, 0xc03bbf7340, 0xc09b33b200)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:46 +0x127
net/http.HandlerFunc.ServeHTTP(0xc035fa4860, 0x76badc0, 0xc014f272d0, 0xc09b33b200)
/usr/local/go/src/net/http/server.go:1995 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc035f719b0, 0x76badc0, 0xc014f272d0, 0xc09b33b200)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/handler.go:189 +0x51
net/http.serverHandler.ServeHTTP(0xc00285af70, 0x76badc0, 0xc014f272d0, 0xc09b33b200)
/usr/local/go/src/net/http/server.go:2774 +0xa8
net/http.initNPNRequest.ServeHTTP(0xc01bf80380, 0xc00285af70, 0x76badc0, 0xc014f272d0, 0xc09b33b200)
/usr/local/go/src/net/http/server.go:3323 +0x8d
k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).runHandler(0xc06f5a8780, 0xc014f272d0, 0xc09b33b200, 0xc07bf68000)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:2125 +0x89
created by k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).processHeaders
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:1859 +0x4f4
As per the comment in https://github.com/jetstack/cert-manager/issues/3229#issuecomment-772600164, this seems to be due to a suboptimal query being run by Helm. Is there any way we can optimize this?
We have just upgraded from Helm2 to Helm3 and have faced this within 3 days of the upgrade. :/
We have no updates on our side, as we don't believe this is a Helm error so much as a Kubernetes control plane error. Right now, all of the complaints that I know of are specific to GCP. You might have better luck asking someone there about the issue.
To my knowledge EKS has never had this problem. I experienced it on AKS a year ago, and it has since been fixed. I know of no cases involving on-prem versions of Kubernetes.
So at this point, we believe the error to be specific to GKE's internal control plane implementation.
This is something we experience in some EKS clusters. We use Helm Terraform provider and we have those kind of issues specially when the resources state grow The error looks like
Error: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 11; INTERNAL_ERROR
As workaround, we moved from secret to configmap as storage backend. Now, we still experience the issue but less often
I traced the problem to kube-apiserver . maybe is apiserver was unable to write a fallback JSON response: http: Handler timeout
I0323 11:54:40.268733 1 trace.go:116] Trace[1575763432]: "List etcd3" key:/secrets/testing00,resourceVersion:,limit:0,continue: (started: 2021-03-23 11:54:39.130390916 +0800 CST m=+2313349.748756297) (total time: 1.138303823s):
Trace[1575763432]: [1.138303823s] [1.138303823s] END
E0323 11:55:39.132838 1 writers.go:118] apiserver was unable to write a fallback JSON response: http: Handler timeout
I0323 11:55:39.133987 1 trace.go:116] Trace[1802854454]: "List" url:/api/v1/namespaces/testing00/secrets (started: 2021-03-23 11:54:39.130320896 +0800 CST m=+2313349.748686231) (total time: 1m0.00363728s):
Trace[1802854454]: [1.138447997s] [1.13838491s] Listing from storage done
Trace[1802854454]: [1m0.003636268s] [58.865188271s] Writing http response done count:1154
We are encountering a similar behavior where helm hangs and no helm updates work either.
Error: unable to build kubernetes objects from release manifest: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout exceeded while reading body
--- observed with 4 EKS clusters today ----
E0323 09:07:32.801092 1 runtime.go:78] Observed a panic: &errors.errorString{s:"killing connection/stream because serving request timed out and response had been started"} (killing connection/stream because serving request timed out and response had been started) goroutine 2585666292 [running]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x3cbaa60, 0xc000408b70) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc014e2dc90, 0x1, 0x1) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82 panic(0x3cbaa60, 0xc000408b70) /usr/local/go/src/runtime/panic.go:679 +0x1b2 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(baseTimeoutWriter).timeout(0xc0482a2fa0, 0xc07f874aa0) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:257 +0x1cf k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(timeoutHandler).ServeHTTP(0xc00d481b40, 0x5230bc0, 0xc059dcf9d0, 0xc03ee51900) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:141 +0x310 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1(0x5230bc0, 0xc059dcf9d0, 0xc03ee51800) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/waitgroup.go:59 +0x121 net/http.HandlerFunc.ServeHTTP(0xc00d021ce0, 0x5230bc0, 0xc059dcf9d0, 0xc03ee51800) /usr/local/go/src/net/http/server.go:2036 +0x44 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x5230bc0, 0xc059dcf9d0, 0xc03ee51700) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x274 net/http.HandlerFunc.ServeHTTP(0xc00d021d70, 0x5230bc0, 0xc059dcf9d0, 0xc03ee51700) /usr/local/go/src/net/http/server.go:2036 +0x44 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithCacheControl.func1(0x5230bc0, 0xc059dcf9d0, 0xc03ee51700) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/cachecontrol.go:31 +0xa8 net/http.HandlerFunc.ServeHTTP(0xc00d481b60, 0x5230bc0, 0xc059dcf9d0, 0xc03ee51700) /usr/local/go/src/net/http/server.go:2036 +0x44 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x52239c0, 0xc03b8afaf8, 0xc03cb65500) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:89 +0x2ca net/http.HandlerFunc.ServeHTTP(0xc00d481b80, 0x52239c0, 0xc03b8afaf8, 0xc03cb65500) /usr/local/go/src/net/http/server.go:2036 +0x44 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x52239c0, 0xc03b8afaf8, 0xc03cb65500) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:51 +0x13e net/http.HandlerFunc.ServeHTTP(0xc00d481ba0, 0x52239c0, 0xc03b8afaf8, 0xc03cb65500) /usr/local/go/src/net/http/server.go:2036 +0x44 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.(APIServerHandler).ServeHTTP(0xc00d021e00, 0x52239c0, 0xc03b8afaf8, 0xc03cb65500) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/handler.go:189 +0x51 net/http.serverHandler.ServeHTTP(0xc00d31ac40, 0x52239c0, 0xc03b8afaf8, 0xc03cb65500) /usr/local/go/src/net/http/server.go:2831 +0xa4 net/http.initNPNRequest.ServeHTTP(0x523e0c0, 0xc060ba6b70, 0xc01bd24000, 0xc00d31ac40, 0x52239c0, 0xc03b8afaf8, 0xc03cb65500) /usr/local/go/src/net/http/server.go:3395 +0x8d k8s.io/kubernetes/vendor/golang.org/x/net/http2.(serverConn).runHandler(0xc0205eb980, 0xc03b8afaf8, 0xc03cb65500, 0xc04d0b29e0) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:2149 +0x9f created by k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).processHeaders /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:1883 +0x4eb
"killing connection/stream because serving request timed out and response had been started"
This is proof that you're hitting the 60s global time out I mentioned previously.
The client needs a faster network connection or you have to list less data. There's no other solutions that don't boil down to making it faster or doing less work.
Most aws to aws traffic goes over 10GbE these days. How much faster do you expect the connection to be? :-)
We have also seen this happen when an intermediate proxy (often in the cloud provider's control plane) is timing out. And this isn't always because of network speed, but because of changing network topology, security configurations, and a host of other reasons. The trick is to find out what on the network is causing the timeouts. In some cases, you may need to file issues with your upstream cloud provider. Testing with kubectl
and curl
are also good ways of trying to find the culprit.
In our experience, helm often works better on a reasonable network (average throughput/latency) then on the fancy cloud side network. This makes me think that we are not dealing with some timeout here, but with a genuine bug in helm implementation.
Or, possibly, a bug on the k8s api endpoint side.
For our EKS setup, helm list
can't handle more than ~3000 versions/secrets.
Cleaning up old versions/secrets solved the issue (we had ~13 000). kubectl get secrets
took only 10 seconds to list more than 13000 secrets, so I believe the issue is on the helm side.
Really, large Helm installations should use the database backend. Kubernetes/Etcd servers are not capable of delivering high numbers of release records in a short amount of time.
What helped me fix the issue was deleting the secrets that helm creates using a simple script NAMESPACE=monitor kubectl get secrets -n $NAMESPACE --no-headers=true | awk '/sh.helm.release/{print $1}'| xargs kubectl delete -n $NAMESPACE secrets
We faced same issue in EKS with cert-manager. We tried to clean up the cluster by deleting some old helm releases which could delete old secrets. Ref: https://github.com/jetstack/cert-manager/issues/3229
If your ansible playbook has an option helm_kubectl_context_is_admin, try to change it from true
to false
.
check you internet cable connection. Restarting my wifi helped me out.
I am facing this issue when installing helm chart of prometheus did anyone resolved the issue?
please. paste the logs @UmairHassanKhan
@UmairHassanKhan Yes, just set the "--history-max" to some low value if you're using helm cli or "max_history" if you're using terraform helm provider. Setting this value to "10" helped me. Also, you can delete old history records manually with kubectl.
I'm seeing this as well, eventually after running a helm upgrade
20+ times, it succeeded. In my case, there isn't anything I can do about the network speed, it's over a satellite link. Is there any hope of getting a timeout option added for situations where there are a large number of secrets/versions or slow network links? It doesn't look like the existing --timeout
option covers that case.
"killing connection/stream because serving request timed out and response had been started"
This is proof that you're hitting the 60s global time out I mentioned previously.
The client needs a faster network connection or you have to list less data. There's no other solutions that don't boil down to making it faster or doing less work.
Where is the 60s global time out set? Is there any way to change it?
kube-apiserver has a --request-timeout
flag. Not sure if @lavalamp means it.
--request-timeout duration Default: 1m0s
--
| An optional field indicating the duration a handler must keep a request open before timing it out. This is the default request timeout for requests but may be overridden by flags such as --min-request-timeout for specific types of requests.
Yes, that's the global time out, and I didn't mention changing it as an option because the vast majority of users don't have access to be changing flags on their cluster's kube-apiserver.
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.
any solution abt this issue as we are also facing issue.
facing the same issue in multiple clusters.
Увеличиваем таймаут --request-timeout=1m0s до 2m0s и все заработает! /etc/kubernetes/manifests/kube-apiserver.yaml
This is due to how the api-server uses etcd. When there are a large number of any resource, secrets in this case, the api-server query retrieves ALL secrets from etcd. When you have more than a few hundred, this can cause some pretty severe latency. We were able to completely bring down a control plane with only 200 secrets. This isn't directly a helm problem, though it was exacerbated when the history-max was unlimited.
Now history-max defaults to 10 which should be sufficient, in most cases, to prevent this specific issue. If you're experiencing this issue and haven't upgraded since the default was changed, either delete your old release secrets or do an upgrade with a version since v3.4.0 when that default was changed.
This is due to how the api-server uses etcd.
I am skeptical of this statement, as kubectl get secrets
and kubectl describe secrets
works for 2000+ Secrets in my cluster, but a helm list
would fail.
apiserver offers pagination. You do not have to read all the objects in a single request. It is very hard on the system. We are likely to incentivize the use of pagination more over time by limiting unpaginated requests in various ways.
I have spent about 4 hours so far fixing this issue. Here're the details:
$ helm version
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}
$ helm --kube-context ctx list --all --deployed --failed --date -n ns --max 1000
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR
$ gd
diff --git a/pkg/storage/driver/secrets.go b/pkg/storage/driver/secrets.go
index 2e8530d0..f3694cfc 100644
--- a/pkg/storage/driver/secrets.go
+++ b/pkg/storage/driver/secrets.go
@@ -35,8 +35,12 @@ import (
var _ Driver = (*Secrets)(nil)
-// SecretsDriverName is the string name of the driver.
-const SecretsDriverName = "Secret"
+const (
+ // SecretsDriverName is the string name of the driver.
+ SecretsDriverName = "Secret"
+ // ListPaginationLimit is the number of Secrets we fetch in a single API call.
+ ListPaginationLimit = int64(300)
+)
// Secrets is a wrapper around an implementation of a kubernetes
// SecretsInterface.
@@ -78,15 +82,36 @@ func (secrets *Secrets) Get(key string) (*rspb.Release, error) {
// List fetches all releases and returns the list releases such
// that filter(release) == true. An error is returned if the
// secret fails to retrieve the releases.
+// We read `ListPaginationLimit` Secrets at a time so as not to overwhelm the
+// `api-server` in a cluster with many releases; fixes
+// https://github.com/helm/helm/issues/7997
func (secrets *Secrets) List(filter func(*rspb.Release) bool) ([]*rspb.Release, error) {
lsel := kblabels.Set{"owner": "helm"}.AsSelector()
- opts := metav1.ListOptions{LabelSelector: lsel.String()}
+ opts := metav1.ListOptions{LabelSelector: lsel.String(), Limit: ListPaginationLimit}
+ // Perform an initial list
list, err := secrets.impl.List(context.Background(), opts)
if err != nil {
return nil, errors.Wrap(err, "list: failed to list")
}
+ // Fetch more results from the server by making recursive paginated calls
+ isContinue := list.Continue
+ for isContinue != "" {
+ secrets.Log("list: fetched %d secrets, more to fetch..\n", ListPaginationLimit)
+ opts = metav1.ListOptions{LabelSelector: lsel.String(), Limit: ListPaginationLimit, Continue: isContinue}
+ batch, err := secrets.impl.List(context.Background(), opts)
+ if err != nil {
+ return nil, errors.Wrap(err, "list: failed to perform paginated listing")
+ }
+
+ // Append the results to the initial list
+ list.Items = append(list.Items, batch.Items...)
+
+ isContinue = batch.Continue
+ }
+ secrets.Log("list: fetched %d releases\n", len(list.Items))
+
var results []*rspb.Release
// iterate over the secrets object list
$ make && stat bin/helm
$ ./bin/helm version
version.BuildInfo{Version:"v3.8+unreleased", GitCommit:"65d8e72504652e624948f74acbba71c51ac2e342", GitTreeState:"dirty", GoVersion:"go1.17.2"}
$ ./bin/helm --debug --kube-context ctx list --all --deployed --failed --date -n ns --max 1000
secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..
secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..
secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..
secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..
secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..
secrets.go:113: [debug] list: fetched 1621 releases
...
...
<list of releases in namespace `ns`>
Note: The in-built UTs are currently failing - I am yet to modify them. I will fix them or if someone can help me fix them asap, I can open a PR and get this ready for merge.
EDIT: The UTs are good now, PR out.
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.
Is there any update?
any update on that? faced with the issue with helm version 3.9.4, so issue still exist
Output of
helm version
: version.BuildInfo{Version:"v3.1.2", GitCommit:"d878d4d45863e42fd5cff6743294a11d28a9abce", GitTreeState:"clean", GoVersion:"go1.13.8"}Output of
kubectl version
: Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2"} Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-gke.9"}Cloud Provider/Platform (AKS, GKE, Minikube etc.): GKE
On Helm list I get
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR
On Helm install chart I get:
request.go:924] Unexpected error when reading response body: net/http: request canceled (Client.Timeout exceeded while reading body) Error: unable to build kubernetes objects from release manifest: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout exceeded while reading body
"Helm delete" is working. Was able to uninstall a release
Additional notes:
there is network activity on "helm list" over one minute or so. (maybe timeout trigered?)
The setup was running for months without any problems
I did the update to Helm v3.1.2 in the this current debugging process for this issue
There was a Nodeupdate on the Kubernetes side recently - (maybe relevant)
Created also a new Cluster on GKE for testing and there "Helm list" .."Helm install" are working.