Open dshackith opened 3 years ago
I have this issue but with some differences
Works if I exclude secrets
$ glooctl check -x secrets
Checking deployments... OK
Checking pods... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking rate limit configs... OK
Checking virtual services... OK
Checking gateways... OK
Checking proxies... OK
No problems detected.
Skipping Gloo Instance check -- Gloo Federation not detected
Same here with gloo-edge 1.6.17.
Still an issue with 1.6.37!
This issues seems to mainly pop-up for configurations where we have gloo set up for many environments (around 60 virtual services, 60 secrets (ssl certs)). This problem does not (usually) appear on namespaces where we have the same gloo version running but less virtual services (around 10 virtual services, 10 secrets). It also seems to depend on the load of the API server / cluster. Sometimes we have a lot of these errors, sometimes less. In general the secrets verification step can be very slow (e.g. minutes, rather than seconds).
Checking secrets... E1112 13:15:08.482036 54982 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E1112 13:15:08.482033 54982 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E1112 13:15:08.482032 54982 request.go:1001] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body)
E1112 13:15:08.482036 54982 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E1112 13:15:08.482619 54982 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E1112 13:15:08.482716 54982 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.6/tools/cache/reflector.go:156: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout or context cancellation while reading body)
When we run glooctl check --exclude secrets it obviously does not report the issue as mentioned above.
This issues seems to mainly pop-up for configurations where we have gloo set up for many environments (around 60 virtual services, 60 secrets (ssl certs)). This problem does not (usually) appear on namespaces where we have the same gloo version running but less virtual services (around 10 virtual services, 10 secrets). It also seems to depend on the load of the API server / cluster. Sometimes we have a lot of these errors, sometimes less. In general the secrets verification step can be very slow (e.g. minutes, rather than seconds).
Checking secrets... E1112 13:15:08.482036 54982 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body) E1112 13:15:08.482033 54982 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body) E1112 13:15:08.482032 54982 request.go:1001] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body) E1112 13:15:08.482036 54982 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body) E1112 13:15:08.482619 54982 request.go:1001] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body) E1112 13:15:08.482716 54982 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.6/tools/cache/reflector.go:156: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout or context cancellation while reading body)
When we run glooctl check --exclude secrets it obviously does not report the issue as mentioned above.
I just took the latest glooctl
v1.10.0-beta8
and got something simular. This issue seems to have become worse when we moved to AWS IAM authenication for kube. Almost feels like it's too many requests for IAM auth / second, but just a gut feelling
❯ ./glooctl-linux-amd64.2 check
Checking deployments... OK
Checking pods... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking rate limit configs... OK
Checking VirtualHostOptions... WARN: VirtualHostOption CRD has not been registered
Checking RouteOptions... WARN: RouteOption CRD has not been registered
Checking secrets... W1125 22:59:25.411878 32643 transport.go:260] Unable to cancel request for *exec.roundTripper
E1125 22:59:25.411958 32643 request.go:1011] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
W1125 22:59:25.411960 32643 transport.go:260] Unable to cancel request for *exec.roundTripper
E1125 22:59:25.412090 32643 request.go:1011] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while readingbody)
E1125 22:59:25.412092 32643 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.9/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap:unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E1125 22:59:25.412194 32643 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.9/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)
W1125 22:59:25.412209 32643 transport.go:260] Unable to cancel request for *exec.roundTripper
E1125 22:59:25.412241 32643 request.go:1011] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while readingbody)
E1125 22:59:25.412290 32643 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.9/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)
W1125 22:59:25.414128 32643 transport.go:260] Unable to cancel request for *exec.roundTripper
E1125 22:59:25.414171 32643 request.go:1011] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while readingbody)
...
We don't have that many services thought
❯ k get secrets -A |grep tls |wc -l
13
❯ k get virtualservices.gateway.solo.io -A | wc -l
33
but a decent amount of secrets
❯ k get secrets -A |wc -l
1154
OK, we will fix this. Prioritizing now. Will make this part of the next release iteration of Jan - March
This issue sounds similar to https://github.com/solo-io/gloo/issues/5061. Our initial findings indicated that https://github.com/kubernetes/kubernetes/issues/91913 was the source. We upgraded our k8s libraries to a version containing a fix and this is available since Gloo Edge OSS 1.10.0-beta12. @dshackith @NelsonJeppesen could you try using a later version of glooctl
to verify whether our updates resolved this particular issue and comment here with the outcome?
@sam-heilbron nope
❯ ./glooctl-linux-amd64.1 version
Client: {"version":"1.11.0-beta3"}
❯ ./glooctl-linux-amd64.1 check
Checking deployments... OK
Checking pods... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking rate limit configs... OK
Checking VirtualHostOptions... OK
Checking RouteOptions... OK
Checking secrets... W0108 20:48:15.990200 7364 transport.go:288] Unable to cancel request for *exec.roundTripper
W0108 20:48:15.990243 7364 transport.go:288] Unable to cancel request for *exec.roundTripper
W0108 20:48:15.990283 7364 transport.go:288] Unable to cancel request for *exec.roundTripper
E0108 20:48:15.990290 7364 request.go:1085] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E0108 20:48:15.990314 7364 request.go:1085] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
W0108 20:48:15.990354 7364 transport.go:288] Unable to cancel request for *exec.roundTripper
W0108 20:48:15.990371 7364 transport.go:288] Unable to cancel request for *exec.roundTripper
E0108 20:48:15.990381 7364 reflector.go:138] pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failedto watch *v1.Service: failed to list *v1.Service: unexpected error when reading response body. Please retry. Originalerror: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E0108 20:48:15.990390 7364 reflector.go:138] pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failedto watch *v1.Pod: failed to list *v1.Pod: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout or context cancellation while reading body)
W0108 20:48:15.990404 7364 transport.go:288] Unable to cancel request for *exec.roundTripper
W0108 20:48:15.990416 7364 transport.go:288] Unable to cancel request for *exec.roundTripper
...
tried with 1.10.0-beta13 as well just in-case this was only merged to 10.10.0-beta12+
This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.
The secrets part works for me now on Gloo EE version 1.15.8, Gloo OSS 1.15.17, Kubernetes 1.26.9.
$ glooctl check
Checking deployments... OK
Checking pods... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking rate limit configs... OK
Checking VirtualHostOptions... OK
Checking RouteOptions... OK
Checking secrets... OK
Checking virtual services... OK
Checking gateways... OK
Checking proxies...
However, it hangs on proxies...
even though the server version is matching the glooctl
version. Looks like this is another issue.
This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.
Describe the bug Running
glooctl check -n custom-namespace
hangs at "Checking secrets"To Reproduce Steps to reproduce the behavior:
glooctl check -n my-namespace
Expected behavior I expect
glooctl check
to complete or fail, not hangAdditional context
Gloo was originally installed 1.3.26 and was upgraded to 1.4.15 It is not clear if using a custom namespace is an issue, but that is what we are using.