kubernetes-sigs / metrics-server

Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
Apache License 2.0
5.8k stars 1.87k forks source link

couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request #157

Closed abizake closed 4 years ago

abizake commented 6 years ago

API Server Logs :-

1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io E1012 08:23:25.282353 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]] I1012 08:23:25.282377 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue. E1012 08:23:25.396126 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:23:25.991550 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:23:46.469237 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:23:55.440941 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:23:55.789103 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:24:25.477704 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:24:25.705399 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:24:55.516394 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:24:55.719712 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) E1012 08:25:13.395961 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) I1012 08:25:25.282682 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io E1012 08:25:25.282944 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable , Header: map[X-Content-Type-Options:[nosniff] Content-Type:[text/plain; charset=utf-8]] I1012 08:25:25.282969 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue. E1012 08:25:25.563266 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Controller Logs :- E1012 08:26:57.910695 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E1012 08:27:13.214427 1 resource_quota_controller.go:430] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request W1012 08:27:17.126343 1 garbagecollector.go:647] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Metric Server Logs :-

I1012 08:22:11.248135 1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key) [restful] 2018/10/12 08:22:12 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi [restful] 2018/10/12 08:22:12 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/ I1012 08:22:12.537437 1 serve.go:96] Serving securely on [::]:443

Kubernetes Version :- 1.12.1

Metric Server Deployment YAML :-

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        imagePullPolicy: Always
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

Any help is appreciated.

cdenneen commented 5 years ago

Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)

ag237 commented 5 years ago

We are also seeing this issue on Kubernetes version 1.10.11, metrics-server v0.3.1.

The error doesn't occur all the time, but seemingly randomly.

HPA is also not working:

Warning FailedGetResourceMetric 12s (x200 over 100m) horizontal-pod-autoscaler unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

Seeing a lot of these errors in the metrics-server logs:

I1206 21:52:20.330969 1 round_trippers.go:386] curl -k -v -XPOST -H "User-Agent: metrics-server/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Accept: application/json, */*" -H "Content-Type: application/json" -H "Authorization: Bearer 8493204" 'https://100.64.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews' I1206 21:52:20.336659 1 round_trippers.go:405] POST https://100.64.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 5 milliseconds I1206 21:52:20.336730 1 round_trippers.go:411] Response Headers: I1206 21:52:20.336753 1 round_trippers.go:414] Content-Type: application/json I1206 21:52:20.336823 1 round_trippers.go:414] Content-Length: 260 I1206 21:52:20.336850 1 round_trippers.go:414] Date: Thu, 06 Dec 2018 21:52:20 GMT I1206 21:52:20.336924 1 request.go:897] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false}} I1206 21:52:20.337051 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.337169 1 wrap.go:42] GET /: (6.608284ms) 403 [[Go-http-client/2.0] 10.150.238.46:34472] I1206 21:52:20.342685 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.342881 1 wrap.go:42] GET /: (283.678µs) 403 [[Go-http-client/2.0] 100.103.86.128:53412] I1206 21:52:20.348443 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.348594 1 wrap.go:42] GET /: (211.166µs) 403 [[Go-http-client/2.0] 10.150.238.46:34472] I1206 21:52:20.353395 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.353521 1 wrap.go:42] GET /: (225.997µs) 403 [[Go-http-client/2.0] 10.150.238.46:34472]

And about the time the 'unable to handle request' error gets thrown, we see this in the API server logs:

{"timestamp":1544123089981,"log":"E1206 19:04:49.196449 1 available_controller.go:295] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io \"v1beta1.metrics.k8s.io\": the object has been modified; please apply your changes to the latest version and try again","stream":"stdout","time":"2018-12-06T19:04:49.196573682Z","docker":{"container_id":"193297c980e6dd2380e420d23c023d20c80422ef8fa1ca1d26d23c64c13cbc42"},"kubernetes":{"container_name":"kube-apiserver","namespace_name":"kube-system","pod_name":"kube-apiserver-ip-.ec2.internal","pod_id":"4d0c7c67-f971-11e8-87f5-0edeec0b08fa","labels":{"k8s-app":"kube-apiserver"},"host":"","master_url":"https://100.64.0.1:443/api","namespace_id":"1598f027-deb5-11e8-8c15-020b58ac630a"}}

ysolis commented 5 years ago

I had this problem, in my case i am using Kops 1.10 with a Gossip based cluster, i added to lines in my deploy/1.8+/metrics-server-deployment.yaml file:

     containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        imagePullPolicy: Always
        # changed to use kubelet unsecure tls and internal ip
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

and after this, kubectl top... worked after 5 minutes

ag237 commented 5 years ago

Coming back around to this, I am still seeing these errors with metrics-server.

Here is my config:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T19:44:19Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
spec:
  containers:
  - command:
    - /metrics-server
    - --kubelet-insecure-tls
    - --kubelet-preferred-address-types=InternalIP
image: gcr.io/google_containers/metrics-server-amd64:v0.3.1

If I spam kubectl top node I will most times get a response, however randomly I will get this error:

I0206 10:58:13.064778   40360 helpers.go:198] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "the server is currently unable to handle the request (get nodes.metrics.k8s.io)",
  "reason": "ServiceUnavailable",
  "details": {
    "group": "metrics.k8s.io",
    "kind": "nodes",
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "service unavailable"
      }
    ]
  },
  "code": 503
}]
F0206 10:58:13.064830   40360 helpers.go:116] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

I'm seeing this in the apiserver logs:

kube-apiserver-ip-1-1-1-1.ec2.internal:kube-apiserver E0206 15:57:32.697886       1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://1.1.1.1:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
kube-apiserver-ip-1-1-1-1.ec2.internal:kube-apiserver E0206 15:57:32.729065       1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io": the object has been modified; please apply your changes to the latest version and try again
abizake commented 5 years ago

The issue still exists in V 1.13.3

ag237 commented 5 years ago

As an update to this, all my issues with metrics-server went away after I set

hostNetwork:
  enabled: true

in the stable helm chart

https://github.com/helm/charts/tree/master/stable/metrics-server

abizake commented 5 years ago

@ag237 Thanks for sharing this . Any reasons on how did this got fix when you enabled host network

luckymagic7 commented 5 years ago

did you solve the problem? I have a same issue.

kops: Version 1.11.1 (git-0f2aa8d30)
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:30:48Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

The master node's api-server log says:OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable v1beta1.metrics.k8s.io failed with: Get https://$CLUSTER-IP:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers). My master nodes' SG have 443 port access from everywhere.

Any ideas?

ghost commented 5 years ago

@abizake I think you are also unable to reach pods on the other nodes. If so, ensure UDP ports 8285 and 8472 ( For Flannel ) are open on all nodes. Ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

abizake commented 5 years ago

@abizake I think you are also unable to reach pods on the other nodes. If so, ensure UDP ports 8285 and 8472 ( For Flannel ) are open on all nodes. Ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

In the above mentioned step , I am actually using calico. The respective ports of calico are open and pods from other nodes are reachable.

Constantin07 commented 5 years ago

Having the same issue with Kubernetes 1.15.2 on Ubuntu 18.04 nodes. Using Calico as SDN.

kubectl logs metrics-server-ddd54b5c5-mxxb7
I0812 20:56:27.161337       1 serving.go:273] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0812 20:56:32.059998       1 manager.go:95] Scraping metrics from 0 sources
I0812 20:56:32.060020       1 manager.go:150] ScrapeMetrics: time: 1.003µs, nodes: 0, pods: 0
[restful] 2019/08/12 20:56:32 log.go:33: [restful/swagger] listing is available at https://:8443/swaggerapi
[restful] 2019/08/12 20:56:32 log.go:33: [restful/swagger] https://:8443/swaggerui/ is mapped to folder /swagger-ui/
I0812 20:56:32.467070       1 serve.go:96] Serving securely on [::]:8443
I0812 20:57:32.060888       1 manager.go:95] Scraping metrics from 4 sources
I0812 20:57:32.063876       1 manager.go:120] Querying source: kubelet_summary:master-node2.internal
I0812 20:57:32.068071       1 manager.go:120] Querying source: kubelet_summary:worker-node1.internal
I0812 20:57:32.068425       1 manager.go:120] Querying source: kubelet_summary:worker-node2.internal
I0812 20:57:32.088240       1 manager.go:120] Querying source: kubelet_summary:master-node1.internal
I0812 20:57:32.256135       1 manager.go:150] ScrapeMetrics: time: 195.03251ms, nodes: 4, pods: 23
 kubectl  top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
AlexRRR commented 5 years ago

I can confirm https://github.com/kubernetes-incubator/metrics-server/issues/157#issuecomment-484047998 helps

Not using the helm chart, I added to the manifest under spec/template/spec hostNetwork: true and now it is working.

Also I am using the flags

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP    
edsonmarquezani commented 5 years ago

I've been having several problems in the cluster because of this, including HPA. https://github.com/kubernetes-incubator/metrics-server/issues/157#issuecomment-484047998 seems to nail it, indeed, but I'm still wondering what's the actual problem. Setting hostNetwork=true shouldn't be necessary at all.

Lincoln-dac commented 5 years ago

i added hostNetwork: true but my problem no fix the apiserver report log "kube-controller-manager: E1011 13:37:24.015616 33182 resource_quota_controller.go:407] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request“”

serathius commented 5 years ago

I don't think metrics-server was meant to run in host network. I think it's a problem with particular overlay network, but it's not my expertise.

Metrics Server uses https://github.com/kubernetes/kube-aggregator to register into Apiserver maybe you could find answers there?

Still it would be useful to document on how metrics server provides Metrics API and what requirements it poses on Network

AshishThakur commented 5 years ago

The comment in values.yaml https://github.com/helm/charts/blob/master/stable/metrics-server/values.yaml mentions that might be required if we use Weave network on EKS. We faced a similar problem in EKS using AWS CNI and this issue seems to fix the problem. I believe this is more a band-aid solution and the root is somewhere else.

hostNetwork:
  # Specifies if metrics-server should be started in hostNetwork mode.
  #
  # You would require this enabled if you use alternate overlay networking for pods and
  # API server unable to communicate with metrics-server. As an example, this is required
  # if you use Weave network on EKS
  enabled: false
ctran commented 4 years ago

Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)

Thanks, this is gold!!!

mojiewhy commented 4 years ago

Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)

Thanks, this is gold!!!

What is SG?

seh commented 4 years ago

Probably "security group," in the context of AWS EC2.

Vishal2696 commented 4 years ago

Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)

How to check this? I have my cluster hosted on Azure AKS

serathius commented 4 years ago

Closing per Kubernetes issue triage policy

GitHub is not the right place for support requests. If you're looking for help, check Stack Overflow and the troubleshooting guide. You can also post your question on the Kubernetes Slack or the Discuss Kubernetes forum. If the matter is security related, please disclose it privately via https://kubernetes.io/security/.

lampnick commented 4 years ago

As an update to this, all my issues with metrics-server went away after I set

hostNetwork:
  enabled: true

in the stable helm chart

https://github.com/helm/charts/tree/master/stable/metrics-server

It works, thanks!

philippefutureboy commented 1 year ago

Note that if you are using GKE (Google Cloud Kubernetes Engine) and that your cluster has been without containers for a long time (multiple days), then GKE decommissions the nodes from the cluster (so as to save you costs). As such, without nodes, the control plane processes cannot start. So if that's your case, all is good! Just run an image or deploy a deployment and everything should start working as per usual :)

solomonshorser commented 1 year ago

@philippefutureboy I'm having this problem in GKE, and yes, my cluster was idle, but I've run two DAGs over the last hour and still it does not work. Is there any other way to revive it?

philippefutureboy commented 1 year ago

No unfortunately the issue has started persisting through spinning new pods on my side as well 😕

solomonshorser commented 1 year ago

No unfortunately the issue has started persisting through spinning new pods on my side as well 😕

Oh. I'm trying to delete a namespace and it can't be deleted because of

'Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request'

solomonshorser commented 1 year ago

Ah, there's a GKE trouble-shooting guide here: https://cloud.google.com/kubernetes-engine/docs/troubleshooting#namespace_stuck_in_terminating_state

Kipkemoii commented 1 year ago

As an update to this, all my issues with metrics-server went away after I set

hostNetwork:
  enabled: true

in the stable helm chart

https://github.com/helm/charts/tree/master/stable/metrics-server

Thanks for sharing this. It worked for me

paolo-depa commented 1 year ago

Same issue: turning off the firewall worked for me (...yeah, quite overkilling but I have no time for fine tuning right now...)

mhemken-vts commented 1 year ago

My solution was this:

❯ kubectl delete -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

I did not have the metrics server installed, nor did I need it. At some point somebody installed it and uninstalled it. But the uninstallation was not complete. We had these lingering resources:

clusterrole.rbac.authorization.k8s.io "system:aggregated-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "system:metrics-server" deleted
clusterrolebinding.rbac.authorization.k8s.io "metrics-server:system:auth-delegator" deleted
clusterrolebinding.rbac.authorization.k8s.io "system:metrics-server" deleted
apiservice.apiregistration.k8s.io "v1beta1.metrics.k8s.io" deleted