Proposal: Support multiple custom/external metrics servers in the cluster

zroubalik commented 3 years ago

Currently only one custom or external metrics server in the cluster is supported (because there can be only one APIService resouce specified), as already discussed in #3 in #68. This way users are not able use multiple projects that are consuming this framework, eg. KEDA, k8s-prometheus-adapter or Datadog etc.

I was thinking about options how to solve this and I'd like to propose, that we will introduce aggregator and child components.

aggregator - that's basically the adapter that we have today (ie. it is responsible for communication with k8s api server, APIService registration etc). + additional functionality: it registers child components as they are added in the cluster, transferring the requests for particular metrics to the child components and will maintain a cache of metrics that are provided by individual childs (which could always fall back to calling ListMetrics() on each child). Could either use:
- a new controller + CRD to register new child components and forward the REST call
- use grpc for communication between aggregator & child, this way the child components could register themselves in the aggregator (by initializing the connection) and we won't need a controller and CRD for this part
child - very lightweight component, that's transferring the metrics to the aggregator, user will just have to implement the interfaces

With this approach, in the cluster there will be one aggregator component which will transfer requests to the individual child. Individual projects, that are consuming this framework, will have to just implement the child. If grpc is used, the whole communication could be pretty simple and we can even include this change into the existing codebase and allow some mixed scenario, where aggregator could provide some metrics as well, but I'd like to avoid this scenario.

What do you think about this proposal? Do you think that this is something that could be acceptable by this project? I can elaborate more (and formalize the proposal), if I see an interest in this.

And yeah, I can contribute this 😄

dgrisonnet commented 3 years ago

Hey @zroubalik, thank you for coming up with this proposal, I think this would be an amazing addition to this project.

I would personally lean towards the CRD approach, as this would not require mutating existing adapters. Moreover, in the long run, this would allow more configuration possibilities than with grpc. If at some point, we need to select metrics from a specific producer, I think, it would be easier to do so with CRDs.

Does your definition of a child component require an actual modification of existing adapters? To me, child components are the current adapters, and the aggregator is a whole new component dispatching API call to its children.

Out of curiosity, with this approach, is your goal to have one component being both an aggregator and a child or have a standalone aggregator?

Do you think that this is something that could be acceptable by this project?

I think that it is worth adding a boilerplate to create metrics aggregators to this project as having multiple metric providers falls into the overall experience of custom metrics in Kubernetes.

s-urbaniak commented 3 years ago

This proposal sounds good to me too! :tada: cc'ing @serathius as I am curious about his thoughts too. Generally, from my side I'd love to see i.e. namespace based support for aggregated APIs in Kubernetes itself, which would solve that problem maybe in another way, but supporting it natively here definitely sounds also like a viable approach.

zroubalik commented 3 years ago

@s-urbaniak great, I'll follow with some more details later.

wrt the namespace based support in k8s itself, if I understand this correctly, this wouldn't help in this case if user would like to consume metrics from multiple providers from the same namespace, or am I wrong?

s-urbaniak commented 3 years ago

@zroubalik correct and this is actually the edge case that I am actually worried about as there doesn't seem to be a clear boundary of responsibilities which provider is responsible for what namespace.

mwerickso commented 3 years ago

We are in need of this exact use case. We need our subscribers to use a Stackdriver metric directly from GCP to scale off of. However, in the same namespace we have flask services that need to use the DD trace metrics to scale off request count.

In the interim, we have our subscribers using the GCP metric from Datadog, but this adds an 8-10 minute delay in the external metrics server getting the metric to scale off (that is how much time it takes for DD to get metrics from GCP).

Is it possible to get this on the roadmap? Is there any information that I can contribute to help architect how this would be solved?

zroubalik commented 3 years ago

@mwerickso sorry for the delay. I am planning to start working on a POC, any help is definitely more than welcome. I will add some details later and will let you know then.

ellistarn commented 3 years ago

This proposal is awesome. In Karpenter, we decided against the k8s metrics api due to this exact single-slot problem: https://github.com/awslabs/karpenter/blob/main/docs/DESIGN.md#prometheus-vs-kubernetes-metrics-api.

I'm a little nervous about the idea of separating by namespace. Namespace is increasingly seen as a security boundary. Similarly, prometheus has moved towards multiple prometheuses deployed into different namespaces. By dividing metrics providers along namespaces, we lose the security properties of namespaces (i.e. metric RBAC) as well as enforce a cluster global semantic for that metrics provider.

Instead, I was thinking about a model where a metrics provider could advertise the metrics paths it supports, and effectively have a universal metrics api router that detects these paths and routes to the right metrics provider. This could be modeled with CRDs, as you're suggesting.

Strawman workflow:

Install universal metrics server as the external metrics provider
Install a specific metrics provider server and CRD. CRD would encapsulate paths and metric translation definition
Universal metrics server would watch CRDs and detect ones that include a well known struct (i.e that encapsulates metrics path). This struct must be a subfield of the metrics provider's CRD.
Users create CRD specific to their metrics provider (e.g. prom/cloudwatch/stackdriver metrics), UMS detects the CRDs and routes to the configured metrics provider server.

Pluies commented 3 years ago

Having ran into the same issue, I'm looking forward to this proposal gaining traction! 🙌

In the meantime, one way of getting around this issue is to reverse-proxy to several metrics provider; I've written a PoC/example at: https://github.com/Pluies/external-metrics-multiplexer

serathius commented 3 years ago

Overall issue touches problem of extending kube aggregation to support multiple apiservers either by namespace or by other means. I think this should be first brought to apimachinery who has more broad view on extending K8s APIs.

arjunrn commented 3 years ago

Based on @DirectXMan12 's comment 2 years ago I looked into what would be required to implement a service which discovers metrics from multiple metrics apiservers and then routes metric requests to the correct server. It would look something like what @ellistarn described. I've written a proposal on how this would work and implemented a prototype which discovers both custom and external metrics. The advantage of this solution is that it doesn't require any changes to Kubernetes APIs or existing custom metrics servers. Also there's no extra configuration for metrics. There's still the open question of how to deal with metrics which are provided by multiple metrics apiservers but this can be handled in the current design or in later iterations. Before I actually put more work into turning the prototype to something that's production ready I would appreciate any feedback particularly from sig-instrumentation.

serathius commented 3 years ago

Awesome, Thanks for doing this! Could you create a KEP from this document? https://github.com/kubernetes/enhancements/tree/master/keps#kubernetes-enhancement-proposals-keps Making official enhancement proposal will allow to increase visibility of this proposal and get some input from SIG apimachinery ensuring that work can move forward.

I would be happy to help you with KEP process.

arjunrn commented 3 years ago

@serathius The plan is to turn it into a KEP if sig-instrumentation is interested. But I can go ahead and do now and then have a discussion there. @s-urbaniak does that work for you?

s-urbaniak commented 3 years ago

@arjunrn definitely, this works for me :+1:

arjunrn commented 3 years ago

@s-urbaniak @serathius Here's the KEP PR. Please let me know if there's anything I could add to explain things better.

zroubalik commented 3 years ago

@arjunrn that's great, I wanted to implement this, but then I had to deal with something more urgent. Now I am back on the track.

I am more than happy to help with the actual implementation/KEP or whathever stuff you need.

randallt commented 3 years ago

Some additional input on this requirement. I found this issue because we are a DevOps team that supports several K8S clusters (specifically OpenShift) and we are transitioning from one metrics provider to another one over the next year. We ran into this problem of not being able to install our new provider's HPA adapter since the existing provider's one is already installed. How will we migrate HPA's smoothly without having both available concurrently? No idea at this point. I'm open to suggestions.

A key feature requirement to consider when implementing this: Squads (teams) have services with HPA's that currently use the existing metrics provider. These HPA's target the current HPA Adapter provided by the vendor and allows for their full rich query language. The idea of predefined metrics never really worked for us, since squads are independent, create their own metrics, and need flexibility in scaling decisions. As far as the external metrics API and HPA's are concerned, there should be the ability to target a particular external metrics provider and give a dynamic metric query--not rely on a set of predetermined metrics/paths. Our current provider solved the rich query requirement with annotations--but a side-effect of this is that each HPA has to have a metric "name" that is unique within the cluster. This is a silly requirement that grows out of this idea that the external metrics provider should be able to advertise all the metrics supported.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

arjunrn commented 3 years ago

/remove-lifecycle stale

dgrisonnet commented 3 years ago

/lifecycle frozen

barkbay commented 3 years ago

I've been working on my own proof concept to test the idea. In case anyone is interested the project is available here: https://github.com/barkbay/metrics-router The core algorithm is inspired from https://github.com/arjunrn/custom-metrics-router with the following changes:

Resource is named MetricsSource (short name is ms), it is possible to get some basic information using kublectl get:

% kublectl get metricssource
NAME            SERVICE                                                           PORT   SYNCED   METRICS
elasticsearch   elasticsearch-metrics-apiserver/elasticsearch-metrics-apiserver   443    true     61
prometheus      custom-metrics/prometheus-metrics-apiserver                       443    true     476

The controller runtime is used for the controller stuff
the reconciliation loop is used to periodically refresh the metrics available from each source (every 5 minutes for now, I'm wondering if it should be a configuration setting in the CRD itself ?)
controller and metrics registry logic are covered by unit tests

There is still a lot to do though:

Error handling should be discussed, metrics might be served from sources with low priorities if not all the sources can't be listed/refreshed.
only insecureSkipTLSVerify is supported
it would need some Prometheus metrics to be exposed (e.g. percentile latency for each service, number of degraded connections....)
it should eventually support Service port described with a name, not only provided as an int
...

Feedbacks are welcome, especially if you have a real use case, I think it would help to improve the KEP. I'm also happy to help on any "official" implementation.

dgrisonnet commented 1 year ago

There was multiple PoCs written overtime. but this issue hasn't moved much since the KEP was closed. As a path forward I would be happy to sponsor the metrics router to become one of SIG Instrumetation subprojects, but before that we would need to figure out what's the state of things:

https://github.com/kubernetes/enhancements/issues/2580 was written a while back, should it be revived in this repository in order to improve the original design?
are any of the PoCs close to being production ready and as such could be moved to the kubernetes-sigs org?

The next things is that we would need enough people to volunteer to drive and maintain the project. At minimum, we would need 2 active maintainers, but the more people we can get, the more likely it is to be accepted as a subproject.

@arjunrn @Pluies @barkbay since y'all worked on PoCs in the past, could you enlighten me about the state of things? Also, would any of you be interested in looking into that effort going forward?

Pluies commented 1 year ago

Hi @dgrisonnet ! Since the PoC above, I've set up Keda and never looked back. Granted, it's an external project and it would be nicer to use a pure "out-of-the-box" Kubernetes-native solution, but between supporting multiple scaling sources and scale-to-zero (which in upstream kube has been in alpha since forever), Keda has solved 100% of my autoscaling plumbing needs 🙏

tomkerkhove commented 1 year ago

That's great to hear! However, KEDA itself would also like to see this happen but we didn't have the engineering yet to contribute this.

Scenarios are:

Migration from OSS to managed offerings of KEDA
Run multiple KEDA installations next to each other for tenant isolation (or per namespace isolation | see https://github.com/kedacore/keda/issues/2654)

dgrisonnet commented 1 year ago

Makes sense @Pluies, thank you for the feedback! :) I totally agree that Keda is a good alternative in most cases considering the sheer amount of scalers that it supports.

afirth commented 1 year ago

@dgrisonnet One issue with a router is that it needs to support all the different ways of authN - including X.509 which cannot be transparently passed through a layer 7 TLS proxy as @Pluies created. The approach there is to give the proxy it's own RBAC, but that's not a good solution for the general case I think (unless the proxy verifies inbound RBAC)

This is a corner that I guess most cluster admins will only face a few times, but it's a really unpleasant corner to be in. I imagine it will get a bit more busy as 1.21 is EOL on EKS in February and https://github.com/amazon-archives/k8s-cloudwatch-adapter/ doesn't support rotating service account tokens AFAIK

JorTurFer commented 1 year ago

amazon-archives/k8s-cloudwatch-adapter doesn't support rotating service account tokens AFAIK

Be care @afirth because k8s-cloudwatch-adapter has been deprecated for 5 months, AWS recommends KEDA and the adapter is not longer maintained

JohnRusk commented 1 year ago

This may be a "dumb question", but I'll ask it anyway. Regarding the problem statement

Currently only one custom or external metrics server in the cluster is supported (because there can be only one APIService resouce specified)

I understand that this means you can't have multiple metrics servers simultaneously mapped to the path apis/custom.metrics.k8s.io and that likewise you can't have multiple metrics servers simultaneously mapped to apis/external.metrics.k8s.io. My question is, what exactly prevents mapping a metrics server to something like apis/foo.metrics.k8s.io or even apis/foo.mydomain.com? Is that actually prohibited by the machinery of K8s? Or is the problem simply that such paths would no longer match what's expected, and so tools that expect to read metrics from custom/external.metrics.k8s.io would not see them?

JohnRusk commented 1 year ago

... what exactly prevents mapping a metrics server to something like apis/foo.metrics.k8s.io or even apis/foo.mydomain.com? Is that actually prohibited by the machinery of K8s? Or is the problem simply that such paths would no longer match what's expected, and so tools that expect to read metrics from custom/external.metrics.k8s.io would not see them?

@zroubalik , @dgrisonnet , hoping you might have a moment to share the answer. Thanks :-)

dcfranca commented 1 year ago

At the moment, we are running kube-metrics-adapter, but want to migrate to Keda... however, we would like to keep both for some time during the migration of the metrics

Is there any workaround for that?

dgrisonnet commented 1 year ago

Sadly not, there is no easy migration process today because you can't have two different source of external metrics. The safest way today is to manually scale all the HPA targets to the max while doing the migration.

dgrisonnet commented 1 year ago

@JohnRusk the HPA controller assumes the existence of resources coming from custom/external.metrics.k8s.io APIs in order to make autoscaling decisions.

For example if you create an HPA with ExternalMetricSource set, the controller will try to get a MetricValue from the external.metrics.k8s.io API.

igmor commented 1 year ago

@JohnRusk the HPA controller assumes the existence of resources coming from custom/external.metrics.k8s.io APIs in order to make autoscaling decisions.

For example if you create an HPA with ExternalMetricSource set, the controller will try to get a MetricValue from the external.metrics.k8s.io API.

How hard would it be to modify ExternalMetricSource to add a parameter pointing to an apiservice name? Then fetching the metric from a specified metrics server in HPA. Obviously all that could be extended to all metric server groups.

vsoch commented 10 months ago

Hi folks! We have a similar need to deploy more than one metrics server - actually. I’m extending that to allow an object to expose its own. What is the status of this / next steps? I’ve implemented operators and various Kubernetes abstractions before so I could definitely help.

I also think it would hugely simplify things to just allow a different URL to expose a particular metric (last comment above) but curious about others’ thoughts.

JorTurFer commented 10 months ago

Hi folks! We have a similar need to deploy more than one metrics server - actually. I’m extending that to allow an object to expose its own. What is the status of this / next steps? I’ve implemented operators and various Kubernetes abstractions before so I could definitely help.

I also think it would hugely simplify things to just allow a different URL to expose a particular metric (last comment above) but curious about others’ thoughts.

Currently, we have drafted a KEP: https://github.com/kubernetes/enhancements/pull/4262 I hope that it goes forward and in next k8s versions this is supported natively

vsoch commented 10 months ago

Beautiful, thank you! I will also follow.

neoakris commented 10 months ago

Thanks for working on this, side note I wrote an article on this issue.
https://cloudnativenow.com/topics/kubernetes-custom-metric-autoscaling-almost-great/
Might be handy to share to help people get up to speed faster if you ever have to explain it to others.

I really hope this gets improved upon, as it's been a long-standing UX issue that's been neglected for a long time.

Dudesons commented 1 month ago

I see the https://github.com/kubernetes/enhancements/pull/4262 is closed, does it mean the proposal is dropped or there is another kep to follow on this subject ? @JorTurFer

kubernetes-sigs / custom-metrics-apiserver

Proposal: Support multiple custom/external metrics servers in the cluster #70