kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.22k stars 1.04k forks source link

Multiple KEDA deployments in a single cluster #2654

Open tomkerkhove opened 2 years ago

tomkerkhove commented 2 years ago

Over the past weeks/months I've had more and more questions if you can deploy KEDA multiple times in a single cluster.

The scenarios for this is:

  1. Hard Multi-tenancy - People want to isolate KEDA across teams/customers in a single cluster
    • Customer A should not be able to see/interfer what customer B does
  2. Soft Multi-tenancy - People want to isolate KEDA across teams/customers in a single cluster for management purposes (example)
    • Given it's shared in the cluster, it lacks clear ownership
  3. Security - People want to reduce the amount of access that a given KEDA deployment has
    • For example, when using pod identity a centralized KEDA needs access to every resource that the apps depend on in the cluster which is a security risk (#2656)

This issue is used to gather all scenarios to see how we can best tackle this ask; if we start doing it. If you do have more scenarios, do let us know.

Background - Why is this not supported today?

As per https://github.com/kedacore/keda/issues/470, you can see that Kubernetes upstream only allows end-users to run a single metric server per cluster. There is an open proposal by @zroubalik to improve this, but until then we cannot deploy the metric server multiple times.

The KEDA operator allows you to watch all namespaces, or just a single one. In theory, we could change KEDA so that we can re-use the metric server and change our Helm charts to be more flexible by providing 3 of them (metric-server, operator & full), we still have concerns with this scenario:

Because of this, we are waiting to support this model until https://github.com/kubernetes-sigs/custom-metrics-apiserver/issues/70 is tackled in KEDA upstream.

luisg707 commented 1 year ago

+1 on this ask.

dgrisonnet commented 1 year ago

I stumbled upon this issue after looking at https://github.com/kubernetes-sigs/custom-metrics-apiserver/issues/70 again. As far as I can tell, this shouldn't be preventing this particular effort since you can actually deploy multiple instance of a metrics server today as long as they are behind the same apiservice/service. For instance that's what we doing with upstream metrics-server and prometheus-adapter.

There is only one thing to note which is that to maximize the efficiency of a multiple replica config, it is recommended to add the --enable-aggregator-routing=true CLI flag to the kube-apiserver so that requests sent to the metrics server are load balanced between the instances.

zroubalik commented 1 year ago

@dgrisonnet that is correct, but that's doable for multiple replicas of the same metrics-server. What we would like to have is to have a multi tenant installation -> ie support multi different metrics servers (KEDA installations) and also support different adapters (Datadog...)

For that we would need to have something like the router discussed in https://github.com/kubernetes-sigs/custom-metrics-apiserver/issues/70

dgrisonnet commented 1 year ago

Ack, just wanted to make sure.

We should definitely look into having the router implemented for that purpose. Out of curiosity, do you know if the various adapters are planning to be converted to KEDA scalers in the future or some of them wants to remain standalone projects?

By the way, I started looking at the router again and I am looking for contributors to drive and maintain it. Do you perhaps have some contributors that would be interested and have time to look into that?

zroubalik commented 1 year ago

Ack, just wanted to make sure.

👍

We should definitely look into having the router implemented for that purpose. Out of curiosity, do you know if the various adapters are planning to be converted to KEDA scalers in the future or some of them wants to remain standalone projects?

The only one I am aware of is DataDog https://docs.datadoghq.com/containers/cluster_agent/external_metrics/?tab=helm, but there might be other adapters, I just don't know :)

By the way, I started looking at the router again and I am looking for contributors to drive and maintain it. Do you perhaps have some contributors that would be interested and have time to look into that?

That's great! @kedacore/keda-contributors ^ anybody intereste? I'd be interested, but the problem for me is time :( So I'd join this effort, but cannot assure full commitment.

tomkerkhove commented 1 year ago

I'd be happy to help on the non-code side if there is need for it.

With regards to existing adapters, Prometheus is another that is already a scaler. But I think this is not related in the sense that we should be open to support other metric servers next to KEDA as well though.

dgrisonnet commented 1 year ago

With regards to existing adapters, Prometheus is another that is already a scaler. But I think this is not related in the sense that we should be open to support other metric servers next to KEDA as well though.

Yeah for Prometheus we are looking at deprecating the project in favor of KEDA Prometheus scaler which is why I was curious if other adapters also choose this path. But I completely agree that it shouldn't be a reason to limit the possibilities for the users.

zroubalik commented 1 year ago

It is also worth mentioning, that with the lastest architecture changes we have done (slimming down the Metrics Server logic), we might be able to provide a solution for multitenancy on our own. The Metrics Server will be just a light layer that will route the requests to individual KEDA installations (based on the namespace).

But I still think that we should try to tackle this upstream as well, to support another adapters next to KEDA.

JorTurFer commented 1 year ago

I'm interested but I have limited bandwidth and I have several things I wanna do in KEDA for next weeks. After them, I'm willing to contribute/maintain this feature. Could this match with your plans @dgrisonnet?

dgrisonnet commented 1 year ago

There is no urgency at all, I am just trying to find a group of people that would be interested in that effort.

Since we have already a couple of SMEs that volunteered but with limited bandwidth, I was thinking that we could try to find a new contributor eager to work on a long term effort and mentor them from the sideline. The time investment from the mentors would be reasonable and it could be a great contribution for a newcomer. I haven't found a candidate yet, but I was thinking about reaching out in a SIG Instrumentation meeting or ask SIG Contributor Experience for guidance on how to find a candidate. Maybe you have someone in mind from KEDA?

If we are not able to find anyone we can always start the effort, design and such and iterate over the project overtime.

After them, I'm willing to contribute/maintain this feature.

That's awesome! I wouldn't expect the maintenance burden to be very big, so it should be reasonable for you.

vcardenas commented 1 year ago

Hello, to mention my use case in case it may help... cluster operators like keda are managed by a platform team (or cluster admin team), but they only concern installation and upgrades, all of the ScaledObjects and more importantly, TriggerAuthentications, would be managed by dev teams (tenants) that can work within one or multiple namespaces only, but definitely not the keda namespace. That is impossible today with the need of having scalers for aws cloud resources, since the keda operator needs to either assume a role with permissions crossing tenant boundaries, or be able to assume different roles also crossing boundaries in the process. The external secrets operator solves this beautifully with the Managed SecretStore per Namespace scenario, although other multitenancy scenarios are supported. a SecretStore seems to be reminiscent of what a TriggerAuthentication is expected to be. That may be worth taking a look for ideas.

JorTurFer commented 1 year ago

Hi @vcardenas . Sadly, they are different use cases. The current limitation is that metrics apis only allow a single service behind the api endpoint, which totally limits the architecture to have a single metrics server with cluster scope and this is a k8s limitation. Those projects you mentioned don't have this problem because they don't expose any kind of metric to the cluster api server.

In KEDA we are working on having this multitenant support or at least something similar, but it's still in progress. BTW, you could use multiple KEDA operators, one per namespace if your use case is for ScaledJobs only and that's already allowed at this moment

vcardenas commented 1 year ago

I get that, my comment was oriented towards the scenario of having a single keda operator/metrics-server and multiple tenants making use of it.

sidharthkumarpradhan commented 1 year ago

Hello Team, Thanks to @JorTurFer for assisting me with some of my queries. Thank you so much. What I am struggling with right now is how to make KEDA watch multiple namespaces but not all. I tried to deploy multiple keda-operators by changing the WATCH_NAMESPACE values according to my need, but this is not working. Only one namespace is actively working others are not scaling up the resources. Any guidance on configuration and anything I could do to achieve my goal. Will grateful for any help here. @vcardenas @zroubalik @JorTurFer @tomkerkhove @dgrisonnet image

sidharthkumarpradhan commented 1 year ago

Another thing I wanted to mention here is, I have also tried to deploy all the components including the Roles and Rolebinding and Service account in one namespace to restrict access to that namespace only, but I am getting no success here. Here is the link to the deployment file. What am I doing wrong, or it is not possible any anyway. Thanks Team.

https://gist.github.com/sidharthkumarpradhan/e457d5cd586af3fd1d03fc10645a9b4b

JorTurFer commented 1 year ago

as we are already talking about this on the discussion, let's continue there to not generate noise on this issue. This issue is to track the feature, and currently it's not supported.