Closed JorTurFer closed 1 year ago
@olivierlemasle @dgrisonnet FYI
/assign /triage accepted
Hey @dgrisonnet Could you share your thoughts about this?
Hey @JorTurFer, sorry for the late reply. I need to spend some time researching this particular topic as I am not familiar with all the details. I'll will try to look into it in the upcoming weeks.
No worries, I just asked to keep alive the topic 😄 No rush at all
@JorTurFer did you conclude what is the best workaround or fix for this problem?
With kubectl 1.26, I started getting these warnings for every kubectl command:
E0418 08:02:11.707022 2989 memcache.go:287] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 08:02:11.786098 2989 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 08:02:11.871669 2989 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 08:02:11.965455 2989 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Downgrading to kubectl 1.25.2 eliminated these warnings for me.
I am running k8s.gcr.io/metrics-server/metrics-server:v0.6.2
in an EKS cluster v1.24.10-eks-48e63af
Metrics-server doesn't expose external metrics so that is a pretty odd error to see. What request did you make with kubectl?
@dgrisonnet I did not do anything fancy, I just did:
kubectl get pods --all-namespaces
Pretty much any k8s command gives the same warning to stdout:
E0418 08:02:11.707022 2989 memcache.go:287] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 08:02:11.786098 2989 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 08:02:11.871669 2989 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 08:02:11.965455 2989 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I'm encountering the same issue reported here: https://github.com/kubernetes-sigs/metrics-server/issues/157
I'm not sure why that issue was closed, or what the resolution to that issue should be.
Downgrading kubectl to 1.25.2 eliminated the warnings for me. I don't fully understand all the pieces involved to understand the root cause of this problem.
It's not metrics server that's complaining Keda does when there's no scaled object or job defined.
This comment mentioned a band-aid solution of enabling hostNetwork: true
in the helm chart for metrics-server: https://github.com/kubernetes-sigs/metrics-server/issues/157#issuecomment-544875105 when running on AWS EKS.
I didn't try that because I didn't know if that would break anything else.
It would be nice if I could better understand the root of the problem to know in which direction to go to fix this.
What I did was make a dummy deployment with a cron
trigger ScaledObject
, and set the desiredReplicas
to 0. It's not a fix but at least kubectl will stop logging those warnings.
Also probably check your keda-operator
kogs. Maybe something is broken in it.
@sansmoraxz Wow that's a weird workaround. Do you have a pointer in GitHub to an example of what you did? Would be nice to have a proper fix so that weird workarounds like this are not required.....
I'm encountering the same issue reported here: https://github.com/kubernetes-sigs/metrics-server/issues/157
This doesn't report an issue with the same API. You might still have an APIService resource around trying to serve the external.metrics.k8s.io/v1beta1
, but that is not needed by metrics-server so you should be able to remove it.
Either way since this seems to be an error coming from KEDA I'd recommend asking for help in one of their channel because this is unrelated to the improvement mentioned in this issue.
@sansmoraxz as you suggested:
kubectl get apiservice
E0418 13:47:31.183042 28041 memcache.go:287] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 13:47:31.261110 28041 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 13:47:31.344404 28041 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0418 13:47:31.425712 28041 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
NAME SERVICE AVAILABLE AGE
v1. Local True 102d
v1.acme.cert-manager.io Local True 23d
v1.admissionregistration.k8s.io Local True 102d
v1.apiextensions.k8s.io Local True 102d
v1.apps Local True 102d
v1.authentication.k8s.io Local True 102d
v1.authorization.k8s.io Local True 102d
v1.autoscaling Local True 102d
v1.batch Local True 102d
v1.cert-manager.io Local True 23d
v1.certificates.k8s.io Local True 102d
v1.coordination.k8s.io Local True 102d
v1.discovery.k8s.io Local True 102d
v1.events.k8s.io Local True 102d
v1.networking.k8s.io Local True 102d
v1.node.k8s.io Local True 102d
v1.pg.percona.com Local True 23d
v1.policy Local True 102d
v1.rbac.authorization.k8s.io Local True 102d
v1.scheduling.k8s.io Local True 102d
v1.storage.k8s.io Local True 102d
v1alpha1.argoproj.io Local True 23d
v1alpha1.crd.k8s.amazonaws.com Local True 23d
v1alpha1.elbv2.k8s.aws Local True 23d
v1alpha1.external-secrets.io Local True 23d
v1alpha1.generators.external-secrets.io Local True 23d
v1alpha1.keda.sh Local True 4d1h
v1alpha1.opentelemetry.io Local True 23d
v1beta1.batch Local True 102d
v1beta1.discovery.k8s.io Local True 102d
v1beta1.elbv2.k8s.aws Local True 23d
v1beta1.events.k8s.io Local True 102d
v1beta1.external-secrets.io Local True 23d
v1beta1.external.metrics.k8s.io keda/keda-operator-metrics-apiserver False (FailedDiscoveryCheck) 4d1h
v1beta1.flowcontrol.apiserver.k8s.io Local True 102d
v1beta1.metrics.k8s.io argocd/metrics-server True 18h
v1beta1.node.k8s.io Local True 102d
v1beta1.policy Local True 102d
v1beta1.postgres-operator.crunchydata.com Local True 23d
v1beta1.storage.k8s.io Local True 102d
v1beta1.vpcresources.k8s.aws Local True 33d
v1beta2.flowcontrol.apiserver.k8s.io Local True 102d
v2.autoscaling Local True 102d
v2beta1.autoscaling Local True 102d
v2beta1.pg.percona.com Local True 23d
v2beta2.autoscaling Local True 102d
So as @dgrisonnet mentioned, this looks like a problem with keda. I will need to investigate further, since I'm not sure how this was installed, since I am also installing metrics-server via the helm chart in the argocd namespace. Sorry for the noise in this thread.
@dgrisonnet , this is related with not having any ScaledObject (so, no metric is exposed) and the problem we have faced with the changes in kube-client related with the discovery client. The problem was already fixed in the upstream and it'll be propagated in next versions, but it isn't something that we can fix in KEDA. In fact, this issue refers to the other issue and it's to try to solve it exposing a fixed set of metrics always, not depending on the amount of metrics
Btw, did you have time to think about this? Are you in KubeCon? Maybe we could talk about this face to face :)
@dgrisonnet , this is related with not having any ScaledObject (so, no metric is exposed) and the problem we have faced with the changes in kube-client related with the discovery client. The problem was already fixed in the upstream and it'll be propagated in next versions, but it isn't something that we can fix in KEDA.
Yeah that was also my guess, but since this issue in custom-metrics-apiserver didn't seem to be the right place to handle this support case, I was trying to re route them to the correct channels.
Btw, did you have time to think about this?
Not yet sadly :/ I don't know what's the potential impact of that change could have on the apiserver/autoscaler. But, my main concern right now is that I was thinking about leveraging this exposition behavior to use discovery in order to implement https://github.com/kubernetes-sigs/custom-metrics-apiserver/issues/70 without having to introduce a complex configuration. My idea was that we could just have users create a priority list for the metrics servers and by discovering the metrics ahead of time, route the API calls properly without needing any complex configuration.
Are you in KubeCon? Maybe we could talk about this face to face :)
I am, let's try to schedule something offline, I would be happy to talk about that in-person :)
We have been talking about this offline, and we agree with next steps: We will explore the fixed list for custom/external metrics directly in here (not requiring it in the interface) and check how it works with external metrics in KEDA (I'll do this), then @dgrisonnet will do the same with custom metrics in prometheus adapter. If everything works as expected, the custom metrics api server will not require the list of custom/external metrics anymore and will expose a single resource for each API all the time.
It seems to have been fixed in client-go
It seems to have been fixed in client-go
Yeah, the problem was solved in Kubernetes repo, but we have discussed not treating metrics as resources. The PoC on going isn't only for fixing that problem, it's also to fix other problems related with huge metrics in the list
Hi to all,
As you know, we have recently faced with the issue produced by the changes in kube-client. Basically, the issue was related with returning an empty list because empty was considered as error.
This has pushed me to think about the current situation and how we expose the custom-metrics information. I'd like to reopen the discussion about what we are returning during listing operations and if we really need to return the current metrics. Currently, we are exposing custom/external metrics listing all of them or by name. I know that this is the global approach in Kubernetes for aggregated APIs, but I'm not sure if it makes sense (most probably due to my knowledge gap here). Other aggregated APIs like
metrics.k8s.io
supports the same actions as us, but in that case, it makes sense because all the metrics available on every the namespace/node. For example, if I queryI will get a list with all the available values in the namespace and it makes sense because there eventually are pod metrics on every namespace,
pod
as metric is generic and it's an abstraction that we can query on every namespace. My doubts appear with custom/external metrics, for example in KEDA, we list all the metrics and expose them during listing requests. The problem is that each query is totally attached to the specific metric/namespace, I can't queryIt's because this why I'm not sure about current approach. We are returning a list of metrics that can't be used for discovering anything, as we return only the metric name generated using the specific info but not the namespace. Basically, we are returning a useless list, because nothing can be done with it apart from knowing the whole list of metrics. This listing action can produce a timeout too, if the metric list is too long, printing an error without any useful information for end users. We have an endpoint which doesn't offer too much information, which is queried by the cluster/users really often, and which can produce timeouts easily in some situations. Another important point is that HPA controller doesn't use this listing endpoint at all, it uses directly the get endpoint providing namespace and metric name from the HPA.
Based on the reasons above, I think that following the same approach as
metrics.k8s.io
could be worth. My proposal is:kubectl get --raw "/apis/custom-external.metrics.k8s.io/v1beta1"
) return a single metric likecustom-metric
andexternal-metric
(depending on the API)Following this approach, we could be safe from changes in discovery processes, and also avoid timeouts listing all the available metrics. I'm not totally sure about the changes required for doing this and neither if they have to be done in this repo or directly in the implementations but I wanted to share my thoughts about this.
BTW, I'm willing to contribute with the changes if we agree with change something