Closed abizake closed 4 years ago
Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)
We are also seeing this issue on Kubernetes version 1.10.11, metrics-server v0.3.1.
The error doesn't occur all the time, but seemingly randomly.
HPA is also not working:
Warning FailedGetResourceMetric 12s (x200 over 100m) horizontal-pod-autoscaler unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Seeing a lot of these errors in the metrics-server logs:
I1206 21:52:20.330969 1 round_trippers.go:386] curl -k -v -XPOST -H "User-Agent: metrics-server/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Accept: application/json, */*" -H "Content-Type: application/json" -H "Authorization: Bearer 8493204" 'https://100.64.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews' I1206 21:52:20.336659 1 round_trippers.go:405] POST https://100.64.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 5 milliseconds I1206 21:52:20.336730 1 round_trippers.go:411] Response Headers: I1206 21:52:20.336753 1 round_trippers.go:414] Content-Type: application/json I1206 21:52:20.336823 1 round_trippers.go:414] Content-Length: 260 I1206 21:52:20.336850 1 round_trippers.go:414] Date: Thu, 06 Dec 2018 21:52:20 GMT I1206 21:52:20.336924 1 request.go:897] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false}} I1206 21:52:20.337051 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.337169 1 wrap.go:42] GET /: (6.608284ms) 403 [[Go-http-client/2.0] 10.150.238.46:34472] I1206 21:52:20.342685 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.342881 1 wrap.go:42] GET /: (283.678µs) 403 [[Go-http-client/2.0] 100.103.86.128:53412] I1206 21:52:20.348443 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.348594 1 wrap.go:42] GET /: (211.166µs) 403 [[Go-http-client/2.0] 10.150.238.46:34472] I1206 21:52:20.353395 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.353521 1 wrap.go:42] GET /: (225.997µs) 403 [[Go-http-client/2.0] 10.150.238.46:34472]
And about the time the 'unable to handle request' error gets thrown, we see this in the API server logs:
{"timestamp":1544123089981,"log":"E1206 19:04:49.196449 1 available_controller.go:295] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io \"v1beta1.metrics.k8s.io\": the object has been modified; please apply your changes to the latest version and try again","stream":"stdout","time":"2018-12-06T19:04:49.196573682Z","docker":{"container_id":"193297c980e6dd2380e420d23c023d20c80422ef8fa1ca1d26d23c64c13cbc42"},"kubernetes":{"container_name":"kube-apiserver","namespace_name":"kube-system","pod_name":"kube-apiserver-ip-.ec2.internal","pod_id":"4d0c7c67-f971-11e8-87f5-0edeec0b08fa","labels":{"k8s-app":"kube-apiserver"},"host":"","master_url":"https://100.64.0.1:443/api","namespace_id":"1598f027-deb5-11e8-8c15-020b58ac630a"}}
I had this problem, in my case i am using Kops 1.10
with a Gossip based cluster, i added to lines in my deploy/1.8+/metrics-server-deployment.yaml
file:
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.1
imagePullPolicy: Always
# changed to use kubelet unsecure tls and internal ip
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
volumeMounts:
- name: tmp-dir
mountPath: /tmp
and after this, kubectl top...
worked after 5 minutes
Coming back around to this, I am still seeing these errors with metrics-server.
Here is my config:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T19:44:19Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
spec:
containers:
- command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
image: gcr.io/google_containers/metrics-server-amd64:v0.3.1
If I spam kubectl top node
I will most times get a response, however randomly I will get this error:
I0206 10:58:13.064778 40360 helpers.go:198] server response object: [{
"metadata": {},
"status": "Failure",
"message": "the server is currently unable to handle the request (get nodes.metrics.k8s.io)",
"reason": "ServiceUnavailable",
"details": {
"group": "metrics.k8s.io",
"kind": "nodes",
"causes": [
{
"reason": "UnexpectedServerResponse",
"message": "service unavailable"
}
]
},
"code": 503
}]
F0206 10:58:13.064830 40360 helpers.go:116] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
I'm seeing this in the apiserver logs:
kube-apiserver-ip-1-1-1-1.ec2.internal:kube-apiserver E0206 15:57:32.697886 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://1.1.1.1:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
kube-apiserver-ip-1-1-1-1.ec2.internal:kube-apiserver E0206 15:57:32.729065 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io": the object has been modified; please apply your changes to the latest version and try again
The issue still exists in V 1.13.3
As an update to this, all my issues with metrics-server went away after I set
hostNetwork:
enabled: true
in the stable helm chart
https://github.com/helm/charts/tree/master/stable/metrics-server
@ag237 Thanks for sharing this . Any reasons on how did this got fix when you enabled host network
did you solve the problem? I have a same issue.
kops: Version 1.11.1 (git-0f2aa8d30)
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:30:48Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
The master node's api-server log says:OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
v1beta1.metrics.k8s.io failed with: Get https://$CLUSTER-IP:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
.
My master nodes' SG have 443 port access from everywhere.
Any ideas?
@abizake I think you are also unable to reach pods on the other nodes. If so, ensure UDP ports 8285 and 8472 ( For Flannel ) are open on all nodes. Ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
@abizake I think you are also unable to reach pods on the other nodes. If so, ensure UDP ports 8285 and 8472 ( For Flannel ) are open on all nodes. Ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
In the above mentioned step , I am actually using calico. The respective ports of calico are open and pods from other nodes are reachable.
Having the same issue with Kubernetes 1.15.2
on Ubuntu 18.04 nodes.
Using Calico as SDN.
kubectl logs metrics-server-ddd54b5c5-mxxb7
I0812 20:56:27.161337 1 serving.go:273] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0812 20:56:32.059998 1 manager.go:95] Scraping metrics from 0 sources
I0812 20:56:32.060020 1 manager.go:150] ScrapeMetrics: time: 1.003µs, nodes: 0, pods: 0
[restful] 2019/08/12 20:56:32 log.go:33: [restful/swagger] listing is available at https://:8443/swaggerapi
[restful] 2019/08/12 20:56:32 log.go:33: [restful/swagger] https://:8443/swaggerui/ is mapped to folder /swagger-ui/
I0812 20:56:32.467070 1 serve.go:96] Serving securely on [::]:8443
I0812 20:57:32.060888 1 manager.go:95] Scraping metrics from 4 sources
I0812 20:57:32.063876 1 manager.go:120] Querying source: kubelet_summary:master-node2.internal
I0812 20:57:32.068071 1 manager.go:120] Querying source: kubelet_summary:worker-node1.internal
I0812 20:57:32.068425 1 manager.go:120] Querying source: kubelet_summary:worker-node2.internal
I0812 20:57:32.088240 1 manager.go:120] Querying source: kubelet_summary:master-node1.internal
I0812 20:57:32.256135 1 manager.go:150] ScrapeMetrics: time: 195.03251ms, nodes: 4, pods: 23
kubectl top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
I can confirm https://github.com/kubernetes-incubator/metrics-server/issues/157#issuecomment-484047998 helps
Not using the helm chart, I added to the manifest under spec/template/spec hostNetwork: true
and now it is working.
Also I am using the flags
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
I've been having several problems in the cluster because of this, including HPA. https://github.com/kubernetes-incubator/metrics-server/issues/157#issuecomment-484047998 seems to nail it, indeed, but I'm still wondering what's the actual problem. Setting hostNetwork=true
shouldn't be necessary at all.
i added hostNetwork: true but my problem no fix the apiserver report log "kube-controller-manager: E1011 13:37:24.015616 33182 resource_quota_controller.go:407] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request“”
I don't think metrics-server was meant to run in host network. I think it's a problem with particular overlay network, but it's not my expertise.
Metrics Server uses https://github.com/kubernetes/kube-aggregator to register into Apiserver maybe you could find answers there?
Still it would be useful to document on how metrics server provides Metrics API and what requirements it poses on Network
The comment in values.yaml https://github.com/helm/charts/blob/master/stable/metrics-server/values.yaml mentions that might be required if we use Weave network on EKS. We faced a similar problem in EKS using AWS CNI and this issue seems to fix the problem. I believe this is more a band-aid solution and the root is somewhere else.
hostNetwork:
# Specifies if metrics-server should be started in hostNetwork mode.
#
# You would require this enabled if you use alternate overlay networking for pods and
# API server unable to communicate with metrics-server. As an example, this is required
# if you use Weave network on EKS
enabled: false
Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)
Thanks, this is gold!!!
Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)
Thanks, this is gold!!!
What is SG?
Probably "security group," in the context of AWS EC2.
Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)
How to check this? I have my cluster hosted on Azure AKS
Closing per Kubernetes issue triage policy
GitHub is not the right place for support requests. If you're looking for help, check Stack Overflow and the troubleshooting guide. You can also post your question on the Kubernetes Slack or the Discuss Kubernetes forum. If the matter is security related, please disclose it privately via https://kubernetes.io/security/.
As an update to this, all my issues with metrics-server went away after I set
hostNetwork: enabled: true
in the stable helm chart
https://github.com/helm/charts/tree/master/stable/metrics-server
It works, thanks!
Note that if you are using GKE (Google Cloud Kubernetes Engine) and that your cluster has been without containers for a long time (multiple days), then GKE decommissions the nodes from the cluster (so as to save you costs). As such, without nodes, the control plane processes cannot start. So if that's your case, all is good! Just run an image or deploy a deployment and everything should start working as per usual :)
@philippefutureboy I'm having this problem in GKE, and yes, my cluster was idle, but I've run two DAGs over the last hour and still it does not work. Is there any other way to revive it?
No unfortunately the issue has started persisting through spinning new pods on my side as well 😕
No unfortunately the issue has started persisting through spinning new pods on my side as well 😕
Oh. I'm trying to delete a namespace and it can't be deleted because of
'Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request'
Ah, there's a GKE trouble-shooting guide here: https://cloud.google.com/kubernetes-engine/docs/troubleshooting#namespace_stuck_in_terminating_state
As an update to this, all my issues with metrics-server went away after I set
hostNetwork: enabled: true
in the stable helm chart
https://github.com/helm/charts/tree/master/stable/metrics-server
Thanks for sharing this. It worked for me
Same issue: turning off the firewall worked for me (...yeah, quite overkilling but I have no time for fine tuning right now...)
My solution was this:
❯ kubectl delete -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
I did not have the metrics server installed, nor did I need it. At some point somebody installed it and uninstalled it. But the uninstallation was not complete. We had these lingering resources:
clusterrole.rbac.authorization.k8s.io "system:aggregated-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "system:metrics-server" deleted
clusterrolebinding.rbac.authorization.k8s.io "metrics-server:system:auth-delegator" deleted
clusterrolebinding.rbac.authorization.k8s.io "system:metrics-server" deleted
apiservice.apiregistration.k8s.io "v1beta1.metrics.k8s.io" deleted
API Server Logs :-
1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io E1012 08:23:25.282353 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]] I1012 08:23:25.282377 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue. E1012 08:23:25.396126 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:23:25.991550 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:23:46.469237 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:23:55.440941 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:23:55.789103 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:24:25.477704 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:24:25.705399 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:24:55.516394 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:24:55.719712 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:25:13.395961 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) I1012 08:25:25.282682 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io E1012 08:25:25.282944 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable , Header: map[X-Content-Type-Options:[nosniff] Content-Type:[text/plain; charset=utf-8]] I1012 08:25:25.282969 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue. E1012 08:25:25.563266 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Controller Logs :- E1012 08:26:57.910695 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:27:13.214427 1 resource_quota_controller.go:430] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request W1012 08:27:17.126343 1 garbagecollector.go:647] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
Metric Server Logs :-
I1012 08:22:11.248135 1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key) [restful] 2018/10/12 08:22:12 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi [restful] 2018/10/12 08:22:12 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/ I1012 08:22:12.537437 1 serve.go:96] Serving securely on [::]:443
Kubernetes Version :- 1.12.1
Metric Server Deployment YAML :-
Any help is appreciated.