kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.49k stars 1.29k forks source link

Extend the ability for CAPI to filter what CRs it reconciles #7775

Open richardcase opened 1 year ago

richardcase commented 1 year ago

User Story

As an operator I would like to be able to run multiple instances of CAPI (and its providers) in the management cluster for operational reasons like multi-tenancy.

As an operator I would like to be able to have more control over which CRs instances of CAPI controllers watch & reconcile for operational reasons like multi-tenancy.

Detailed Description

As a result of #4119 (issue #4004) it's possible to tell instances of CAPI to only reconcile CRs that have a specific value for the cluster.x-k8s.io/watch-label label. This means when filtering that every instance of CAPI must correspond to a specific value for this filter.

It is not currently possible to say instance 1 of CAPI reconciles CRs with a label value val1 and instance 2 of CAPI reconciles CRs where the label value is not val1.

It would be good to have the ability to provide a wider range of "filters". For my scenario, i'm interested in having support for == and != but it could easily be || or &&.

Anything else you would like to add:

The current filtering based on the label (and the WatchFilterValue) is limited by the allowed characters for the label value (i.e. ! = && are not supported) and that its a single label.

One option to support != is to look for a prefix in the label value like: cluster.x-k8s.io/watch-label: not system. But this seems less than ideal for a number of reasons.

Another option is that we support label selectors (or some other mechanism) in the CAPI controllers.

/kind feature

sbueringer commented 1 year ago

Sounds okay to me.

@richardcase just a question about a related topic out of curiosity. do you know how webhooks are used in those scenarios? (i.e. are all provider instances called or only one of them)

This question shouldn't block this issue as the issue is about extending an existing functionality

richardcase commented 1 year ago

Good question @sbueringer, I hadn't overly considered that.....but it's something of interest now you've mentioned it 🤣 Will have a think and get back to you.

fabriziopandini commented 1 year ago

/triage accepted If the problem is multi-tenancy, in the sense of infrastructure tenant, our recommended way is to achieve this by using a single instance of the controller while leveraging different credentials.

Based on my experience, running many instances of the same controllers will lead to more problems than benefits, because in Kubernetes today we have a unique type space (CRD and webhooks) and this conflicts with managing the lifecycle of many instances. One day, when https://github.com/kcp-dev/kcp will be upstream this will be much simpler

However, we already committed to make this possible despite well know issues, and this seems an iteration of this. If we have to improve the semantics for filtering, let me throw a crazy idea on the table. What about deprecating the current watch-label and introducing something like watch-label-selector (a serialized label selector), so we give full power to users while leveraging core Kubernetes capabilities/concepts?

enxebre commented 1 year ago

What are the limitations that prevent this use case from being supported by leveraging --namespace to watch the each capi controller? Also can we elaborate further the use case that motivates this need?

richardcase commented 1 year ago

The --namespace approach has a similar limitation to the cluster.x-k8s.io/watch-label label in that it's a 1:1 mapping from the filter value to the CAPI instance. I'm after something a bit more flexible, as an example:

hzxuzhonghu commented 1 year ago

What we need is not only filter on labels. In kurator, we also need to filter by the spec, likely the kubeconfigControlPlane's reference.

hzxuzhonghu commented 1 year ago

So i propose passing a filter function to the Reconfiler, and every reconcile can has its own customization filter.

Today it is very friendly, because it requires all resources to have the same label.

sbueringer commented 1 year ago

Regarding the discussion from here: https://github.com/kubernetes-sigs/cluster-api/pull/8016/files#r1090501708

Filters should be applied before actually entering in the reconcile loop (e.g. in while setting up controllers)

But then how can we filter out resources that are triggered by another kind of resource?

I think @hzxuzhonghu is right.

All predicates that we can pass into the controller builder or the controller when using For / Owns / Watches / Watch etc. only filter the incoming events. The "pipeline" for watches looks something like this:

tl;dr all the predicates we add are only applied to the events. They are not applied to the final objects we reconcile.

I'm not aware of a mechanism in CR today which allows us to filter based on the reconcile.Requests and especially not on the full objects (@vincepri please correct me if I'm wrong).

Some options:

  1. Add the Filter parameter to all our reconcilers and use them during Reconcile after we got the reconciled object (your PR: https://github.com/kubernetes-sigs/cluster-api/pull/8016)
  2. Implement a "wrapper" Reconciler around all our reconcilers which basically does the same (get the object and filter) (could work with generics)
  3. Implement something in controller-runtime (not sure how exactly)

cc @vincepri WDYT?

hzxuzhonghu commented 1 year ago

@sbueringer @vincepri any conclusion

sbueringer commented 1 year ago

I posted my take, can't answer for others.

hzxuzhonghu commented 1 year ago

cc @fabriziopandini If you are ok with https://github.com/kubernetes-sigs/cluster-api/issues/7775#issuecomment-1410393884, can we let https://github.com/kubernetes-sigs/cluster-api/pull/8016 go

richardcase commented 1 year ago

@sbueringer - great write-up and suggested options 🥇

When i created this issue, i had in mind a way of filtering events on any attribute of client.Object, so limited to things like namespace, labels, annotations etc and not on anything in spec/status. And then support operators like == and!=. The idea being that i can say "only reconcile Clusters where the watch label value != val1"

Something like this could ultimately translate down to predicates (like the WatchFilterValue does) unless i'm mistaken.

alexander-demicev commented 1 year ago

@fabriziopandini @richardcase watch-label-selector seems like a good approach for our use case, how do we structure it? We can possibly reuse https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L1203 and all the logic that comes with parsing it, but it might be too complicated for an annotation.

vincepri commented 1 year ago

Going back to the original use case:

I would like to be able to run multiple instances of CAPI (and its providers) in the management cluster for operational reasons like multi-tenancy.

Controller runtime has some mechanics today to filter events, although if the goal is to have hard tenancy requirements, data should probably be logically separated in different namespaces first, and then caches should pre-filter on those namespaces.

There is some missing flexibility in all of this, for example adding or removing namespaces being watched at runtime isn't something controller runtime supports; but there are other ways to get around it maybe with the dynamic reloading on RBAC (https://github.com/kubernetes-sigs/controller-runtime/issues/2176) if that gets implemented.

In general, I'd start by looking at the problem from controller-runtime lenses and list the Cluster API use case as a primary goal.

Danil-Grigorev commented 1 year ago

Could #8982 be a way to solve it? It allows to add event filtering for all discussed use-cases, with a different level of complexity.

vincepri commented 1 year ago

Maybe? We should start from a small proposal in controller runtime first before implementing something very custom in Cluster API; the client-side predicates don't solve any of the hard tenancy requirements though. For example: cache would still hold objects that a specific manager shouldn't reconcile, they're just filtered before being passed along.

alexander-demicev commented 1 year ago

Is it possible to somehow filter objects based on a logical expression? This way the solution to the problem can be more flexible. Caches won’t hold the object if the expression evaluates to true.

vincepri commented 1 year ago

The cache can be filtered by namespace, labels, or fields. Although these are very simple equality parameters; supporting dynamic behaviors more than that would require changes in client-go and potentially api-server as well. Ultimately we'd want the internal cache to follow RBAC rules; results returned from list/watch calls should be pre-filtered based on those parameters.

Danil-Grigorev commented 1 year ago

I think that to provide full flexibility to solving this the expression selector should be added all the way down to apimachinery ListOptions, then implemented it in client-go and proxy this in controller-runtime leveraged with ByObject option. It is a lot of work. https://github.com/kubernetes/kubernetes/issues/53459 is talking about the need to achieve some more generic filtering mechanism, and they suggest to do this on local controller caches anyway, so it might be the way, though a complicated one.

RBAC approach sounds interesting but I'm not sure it provides such granularity the issue describes. You can't do logical expressions as well as support != on some label key, and not always you can modify something you want to ignore to apply to existing rule set.

vincepri commented 1 year ago

Are we trying to solve a hard tenancy problem, or a sharding one?

richardcase commented 1 year ago

Are we trying to solve a hard tenancy problem, or a sharding one?

For our use case, it's more a case of sharding. We'd like to have multiple instances of the controllers, each reconciling a subset of cluster definitions, where each instance isn't limited to a single watch filter or namespace.

vincepri commented 1 year ago

That's a bit easier to think through, although I'd start by opening an issue in Controller Runtime first. We'd need a small proposal: we can make an experimental features that allows controllers to shard data automatically based on a subset of parameters. We'd still be limited to what the api-server can offer if we care about caching only specific data, although that use case becomes an optimization more than anything for this goal.

fabriziopandini commented 4 months ago

/priority backlog