kubernetes-sigs / controller-runtime

Repo for the controller-runtime subproject of kubebuilder (sig-apimachinery)
Apache License 2.0
2.55k stars 1.15k forks source link

Enable filtered list watches as watches #244

Closed nkvoll closed 3 years ago

nkvoll commented 5 years ago

When setting up watches during initialization it's currently not possible to filter by any selectors (which is possible using list watches).

For example it is not possible to only watch pods with specific labels (e.g having the label pod-type: my-controller-type). The current behavior results in very broad caching, which might not be desirable for large Kubernetes deployments.

In some scenarios an operator could contain multiple controllers, and they all share caches, so keying caches on GVK's alone might be problematic if they want to watch the same resource type, but with different filters.

When doing List/Get, how would one decide which of the caches to use? It seems that perhaps this needs to be an explicit choice by the operator developers?

jeefy commented 5 years ago

+1

I've run into a similar issue with a controller I'm writing. I wound up starting a goroutine when the controller gets added to watch/act on certain objects that are created and labeled by another controller outside of mine.

DirectXMan12 commented 5 years ago

we'd have to specify this at the manager level, but it should be possible to pass restricting items.

DirectXMan12 commented 5 years ago

(since caches are shared)

DirectXMan12 commented 5 years ago

we could also let users manually specify auxiliary caches

DirectXMan12 commented 5 years ago

if anyone has ideas for a design doc, I'd love to see them.

DirectXMan12 commented 5 years ago

/kind feature /priority important-soon

this has been coming up a lot lately, and seems like a good thing to tackle in the list of related issues.

DirectXMan12 commented 5 years ago

i.e. one sketch:

Have a list-options watch predicate, smartly detect the presence of that (in some generic way) and pass it down to GetInformer. If we already have a global informer, just use that. If we don't have a global informer, create a new filtered informer. We delay creation of informers to figure out if one person needs a global informer, and just use that, to allow splitting and combining seamlessly. The question is how to deal with the cached client:

That last one might solve multiple problems in one fell swoop, but I'd need to see a design sketch to make sure that it's not too arcane.

poidag-zz commented 5 years ago

I will follow this to understand how this is properly implemented and any questions from a user or testing perspective please ask and I'll do my best to assist. In the interim I've implemented this for my own use case.

kvp := &keyValuePair{
    key:   "app",
    value: "test",
}

    filter := newFilter(keyValueFilter(kvp))

    err = c.Watch(&source.Kind{Type: &v1.Deployment{}}, &handler.EnqueueRequestForObject{}, filter)
if err != nil {
    return err
}

type keyValuePair struct {
    key   string
    value string
}

var kvp keyValuePair

type filterFuncs struct {
    // Create returns true if the Create event should be processed
    CreateFunc func(event.CreateEvent) bool

    // Delete returns true if the Delete event should be processed
    DeleteFunc func(event.DeleteEvent) bool

    // Update returns true if the Update event should be processed
    UpdateFunc func(event.UpdateEvent) bool

    // Generic returns true if the Generic event should be processed
    GenericFunc func(event.GenericEvent) bool
    Filter      keyValuePair
}

func keyValueFilter(kvp *keyValuePair) func(*filterFuncs) {
    return func(f *filterFuncs) {
        f.Filter.key = kvp.key
        f.Filter.value = kvp.value
    }
}

func newFilter(opts ...func(*filterFuncs)) *filterFuncs {
    f := &filterFuncs{}
    for _, opt := range opts {
        opt(f)
    }
    return f
}

func (p filterFuncs) Create(e event.CreateEvent) bool {
    labels := e.Meta.GetLabels()
    if val, ok := labels[p.Filter.key]; ok {
        if val == p.Filter.value {
            return true
        }
    }
    return false
}
DirectXMan12 commented 5 years ago

That's just like a predicate, though -- it doesn't actually get you most of the benefits of the filtering that people are asking for here, since it doesn't ask for server-side filtering.

poidag-zz commented 5 years ago

I know... As I said above the code block

I will follow this to understand how this is properly implemented and any questions from a user or testing perspective please ask and I'll do my best to assist. In the interim I've implemented this for my own use case.

DirectXMan12 commented 5 years ago

Ack, just wanted to make sure we were on the same page :-)

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

DirectXMan12 commented 5 years ago

/remove-lifecycle stale

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/controller-runtime/issues/244#issuecomment-508303815): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
alvaroaleman commented 5 years ago

/reopen /remove-lifecycle rotten

k8s-ci-robot commented 5 years ago

@alvaroaleman: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/controller-runtime/issues/244#issuecomment-508369571): >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
shawn-hurley commented 5 years ago

bumping this feature request:

allow users to turn off auto-caching, make them explicitly get filtered caching clients if they want (doable, more work for users). This potentially lets us allow users to specify that they don't want the client to ever cache a particular object (could encourage wrong patterns).

I think this is a viable option, but I wonder if we could have a more straightforward pattern like a user specifies a watch with a label. If they do, then we use the label-filtered cache for that specific GVK. If another watch is created without that label-filter, then we will error out and not establish the watch. The most significant caveat that I see is the client will only use the label filtered cache. We would need to make that clear in the documentation, or is there some other mechanism to alert the user to this.

DirectXMan12 commented 5 years ago

the main issue with that approach is that it makes it so that certain controllers co-habitating suddenly breaks things, which is less than optional. Still, I'm not sure that there's a good solution here, so we may have to go with the least-sub-optimal solution

shawn-hurley commented 5 years ago

What if we made the restriction explicit on the manager or during cache creation? Does this feel like the restriction on a namespace that we do now?

DirectXMan12 commented 5 years ago

yeah, that could work, potentially, if it solves your usecase

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

lulf commented 4 years ago

I have a similar need for being able to specify namespace for watches (similar to https://github.com/kubernetes-sigs/controller-runtime/issues/124#issuecomment-417438810). I've been playing around with that in #692, but it could perhaps be made more generic than what is there at present. My main motivation for watching per namespace is to ensure I can grant controllers only the minimal permissions they strictly need (i.e. filtering by namespace client-side is not sufficient)

alvaroaleman commented 4 years ago

@DirectXMan12 re https://github.com/kubernetes-sigs/controller-runtime/issues/244#issuecomment-451336202: Would it be another option to make the cache Reader smartly detect options that could be used to filter informers (namespace, labelSelector, fieldSelector), check if we have a cache for those already, if yes use it, if not fall back to unfiltered cache (which means creating it if it doesn't exist yet)?

lulf commented 4 years ago

How about an API something like this:

client.Watch(source.NewKind(&v1.Pod{}, client.InNamespace("namespace1")), ...)
client.Watch(source.NewKind(&v1.Pod{}, client.InNamespace("namespace2"), client.MatchingLabels(labelMap), ...)

The options are passed down to the informer map which will create additional caches if they are not covered with any existing informer.

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

DirectXMan12 commented 4 years ago

/lifecycle frozen

lulf commented 4 years ago

I'll take a stab at a design sketch that I'd love to get more feedback on. I have split the requirements in 2 (I use the term 'filter' to mean watch/list filters like namespace, labels, field selector etc.):

  1. Ability to create watches with different filters for different GVKs
  2. Ability to create watches with different filters for the same GVK

For the first requirement, there is not really any API change needed IMO. However, for the second requirement, there are a few corner cases that must be taken into account.

Lets assume watch A and B with different filters are created (in order) for the same GVK. Then, the following can happen:

Resolving the conflicts by returning an error may be sufficient for most cases. If not, we need a good way to signal users of error conditions.

(IMO, the best user experience would be if it was possible for multiple informers to share the same cache. That way, as long as duplicate cache events were handled properly (i.e. using metadata.resourceVersion?), one could use the same cache client.)

Proposed design - static GVK delegation

Create mapping of GVK -> Filters when creating the cache/manager. A new type DelegateCache could take the following mapping as parameter:

NewDelegateCache(defaultFilters []Filter, gvkToFilter map[schema.GroupVersionKind][]Filter) Cache

The Filter type would basically be something like ListOpts and give the ability to restrict things like namespace, labels etc. for different types.

With this design, the cache filters are the 'hard' limits on what can be filtered on, and its not possible to use conflicting filters for the same GVK.

This would also not require an API change. When creating watches, the pre-configured filters will be used depending on type.

More specific filtering can be done using predicates as today.

It does not solve the second requirement of using different filters for the same GVK.

Proposed design - on-demand delegation

I think a modification to the API is required in order to allow for different filters for the same GVK. One could extend the current controller api with the ability to supply filters, something like:

controller.Watch(source.NewKind(&v1.Pod{}, filter.InNamespace("namespace1"), filter.MatchingLabels(labelMap), &handler.EnqueueRequestForObject{})

Moreover, the cache/manager may be configured with default filters (i.e. replace 'namespace' with a list of filters).

The cache implementation can examine the filters provided in GetInformer and make a decision on what to do according to the different corner cases. Alternatives for conflict resolution if existing informers cannot be used:

Example usage

Watching the same GVK with complementing watches

m, err := manage.New(cfg, manager.Options {
   defaultFilters: []Filter{},
})
...
c.Watch(source.NewKind(&v1.Pod{}, source.InNamespace("namespace1"), ...)
c.Watch(source.NewKind(&v1.Pod{}, source.InNamespace("namespace1"), source.MatchingLabels(labelMap)), ...)

The first Watch will create a new cache with the filters. The second watch could use the same informer and create predicates to do client-side filtering.

Watching the same GVK with conflicting watches due to order of creation

...
c.Watch(source.NewKind(&v1.Pod{}, source.InNamespace("namespace1"), source.MatchingLabels(labelMap)), ...)
c.Watch(source.NewKind(&v1.Pod{}, source.InNamespace("namespace1")), ...)

The first Watch will create a new cache with the filters. Depending on the conflict resolution, either return error or return handle of new cache to client.

Watching different GVKs with different options:

c.Watch(source.NewKind(&v1.Pod{}), ...)
c.Watch(source.NewKind(&v1.Secret{}, source.InNamespace("namespace1")), ...)

The first watch will just use the defaults, the second will create a new cache and return handle. However, since its a different type, one could have the cache create a delegated mapping.

@shawn-hurley @alvaroaleman @DirectXMan12 thoughts?

shawn-hurley commented 4 years ago

I like the idea of on-demand delegation, I am worried that implementation is going to get really messy to handle the use cases.

Watching the same GVK with conflicting watches due to order of creation

So if the first call to watch is more restrictive then the second case, we will error out?

What does it look like if we try and handle it?

lulf commented 4 years ago

So if the first call to watch is more restrictive then the second case, we will error out?

What does it look like if we try and handle it?

The alternatives I can think of are:

alvaroaleman commented 4 years ago

@lulf The tricky part here is not so much how exactly we change the cache, its mostly about finding a good solution for the cache reading client to determine which cache to use, see https://github.com/kubernetes-sigs/controller-runtime/issues/244#issuecomment-451336202

Apart from the options Solly outlined, I also suggested this:

Would it be another option to make the cache Reader smartly detect options that could be used to filter informers (namespace, labelSelector, fieldSelector), check if we have a cache for those already, if yes use it, if not fall back to unfiltered cache (which means creating it if it doesn't exist yet)?

I personally would like to see a gdoc that outlines the ideas we have on how the cache reading client knows which cache to use and ignores how we change the cache itself or the watch calls, as that is the easy part. We can then discuss the options in detail on the gdoc.

vincepri commented 4 years ago

/kind design

@lulf Would you be able to open a PR against controller-runtime with a design document?

jkbschmid commented 4 years ago

My team is also interested in this feature. We have a controller that watches a small set of secrets. However, overall the cluster has many (also larger) secrets which increases our pod requests/limits drastically, because they will all be cached. In our use case, using a separate cache for each filtered watch would absolutely suffice, because the number of secrets we are interested in is small. Therefore, we only have very little additional memory overhead because of duplication. @lulf Is there any ongoing work atm?

lulf commented 4 years ago

@jkbschmid I've not found time or need to work on this since my last proposal. We have a workaround 'delegating cache' that solves a subset of the requirements (i.e. we can at startup time set which GVKs we want to watch globally)[1].

FYI if you haven't seen this already: if you know which namespaces you need to watch up front, you can have a look at the multiNamespaceCache in pkg/cache/multi_namespace_cache.go

[1] https://github.com/EnMasseProject/enmasse/tree/master/pkg/cache

arturobrzut commented 4 years ago

My team is also interested in this feature. We have a controller that watches a small set of secrets. However, overall the cluster has many (also larger) secrets which increases our pod requests/limits drastically, because they will all be cached. about 100MB per 1k secrets. Is there any plan to fix/implement it in the near future?

estroz commented 4 years ago

Working on this. Will write up a doc and discuss at the first meeting after it is drafted.

/assign

estroz commented 4 years ago

I think a prototype with docs/tests is a good way to kick off the discussion on choosing one solution over the other, and would give me/us a chance to understand limitations of a particular solution. This problem is hard to reason about without some code to back up a solution.

This is more or less what everyone seems to be saying about informer creation (from https://github.com/kubernetes-sigs/controller-runtime/issues/244#issuecomment-451336202):

Have a list-options watch predicate, smartly detect the presence of that (in some generic way) and pass it down to GetInformer. If we already have a global informer, just use that. If we don't have a global informer, create a new filtered informer. We delay creation of informers to figure out if one person needs a global informer, and just use that, to allow splitting and combining seamlessly.

As for the cache Reader solution, I like the idea of smartly detecting which cache to use based on selectors/namespaces supplied to Get()/List(), like @alvaroaleman suggested:

make the cache Reader smartly detect options that could be used to filter informers (namespace, labelSelector, fieldSelector), check if we have a cache for those already, if yes use it, if not fall back to unfiltered cache

If anyone would like to expand/comment on the above two parts of the prototype, or voice any objects, feel free.

camilamacedo86 commented 4 years ago

I really liked the idea made in the comment https://github.com/kubernetes-sigs/controller-runtime/issues/244#issuecomment-572978846 +1.

        controller.Watch(source.NewKind(&v1.Pod{}, filter.InNamespace("namespace1"),
            filter.MatchingLabels(labelMap), &handler.EnqueueRequestForObject{})
taylormgeorge91 commented 4 years ago

@estroz Is this still progressing? We are keen to see this feature enabled.

estroz commented 4 years ago

@taylormgeorge91 I haven't had as much time as I'd like to work on this, but I did just post a rough prototype (#1178). Feel free to critique.

pagarwal-tibco commented 4 years ago

I also have a use case of watching pods running on a node and not watch all the pods in the cluster. This will limit the number of kubernetes resources needs to cached/watched and processed. My use case is of having daemon-set watching pods on current node, hence I am looking for a solution like having a fieldSelector for the resources being watched/cached and reconciled.

I see that kubernetes client-go library provides similar functionality with func NewListWatchFromClient(c Getter, resource string, namespace string, fieldSelector fields.Selector) *ListWatch in package /tools/cache. Is there a similar way to achieve this using kubebuilder?

It would be very useful to have this feature which I think is already supported by underlying kubernetes client-go library. This will also help in reducing cache, memory footprint and CPU requirements when there are huge number of pods in the cluster.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

StevenACoffman commented 3 years ago

/remove-lifecycle stale

pagarwal-tibco commented 3 years ago

Any updates on this?

qinqon commented 3 years ago

I have try a PR to specify a fieldselector at manager build time https://github.com/kubernetes-sigs/controller-runtime/pull/1404.

timebertt commented 3 years ago

New PR: #1435

invidian commented 3 years ago

1435 is now merged, so is this resolved?

taylormgeorge91 commented 3 years ago

It is being worked into a release currently @invidian

Looking at the tags there is a v0.9.0-beta available if you want to test.

invidian commented 3 years ago

Looking at the tags there is a v0.9.0-beta available if you want to test.

Thanks, will do!