Proposal: Add reconciliation buffer to reduce load

toonsevrin commented 4 years ago

Some of my controllers receive a bunch of reconciliation requests. However there's no improvement to the controller quality if I handle them all individually compared to handling them once every now and then.

I propose that we can optionally add a following controller option to these options: https://github.com/kubernetes-sigs/controller-runtime/blob/2361b0515224bcc1cd05d63503c6cf85b80cfd85/pkg/controller/controller.go#L33-L44

I'm definitely not an expert, but I think the option could have the same interface as the rate limiting interface, but instead of working on retries, working on all requests?

vincepri commented 4 years ago

/priority awaiting-more-evidence

Can you clarify your use case a little bit more?

vincepri commented 4 years ago

/kind design

toonsevrin commented 4 years ago

/priority awaiting-more-evidence

Can you clarify your use case a little bit more?

This is generally useful when a CRD may be changed extremely often (eg multiple times per second) while you do not care about the reconciliation happening for each of these changes. Instead, you are fine with it happening once every now and then on the latest change.

toonsevrin commented 4 years ago

My personal use case is a controller which listens to all secret, namespace and serviceaccount changes. The quality of the controller would however not really change if the reconciliation was done only once every, say, 10 seconds. Even if 100 events were watched in this period.

vincepri commented 4 years ago

ah yes, that's what we initially understood when chatting at the community meeting. It seems an interesting use case indeed

cc @DirectXMan12

toonsevrin commented 4 years ago

I think so too! Another use case is where there is assymetry: eg. When changing one resource (for example a deployment) results in the controller changing hundreds of other resources. Adding a buffer protects these controllers from being dossed

It becomes even more useful when you create a circular reconciliation loop by accident: Let's say controller A watches secret changes and updates the relevant deployments whenever a secret is changed. Controller B updates relevant secrets whenever a deployment is updated. All though this is horrible in general, a buffer on B would at least prevent this from completely blowing up :)

DirectXMan12 commented 4 years ago

aah, that makes sense. I think we'd have to heavily document this to avoid footguns, but this seems like a reasonable use case.

@toonsevrin can you write up a short design in a teensy bit more detail? Mainly interested in "how do we teach this to avoid confusion with the backoff rate limitter".

DirectXMan12 commented 4 years ago

Here's our (short-and-sweet) template: https://raw.githubusercontent.com/kubernetes-sigs/controller-runtime/master/designs/template.md

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

vincepri commented 4 years ago

/lifecycle frozen

vincepri commented 4 years ago

/priority important-longterm

alvaroaleman commented 4 years ago

FWIW, we've build something like this as a Reconciler wrapper downstream: https://github.com/openshift/ci-tools/blob/e847c7d352263b7415732d4309c041d3afb8ab71/pkg/controller/util/request_coalescer.go#L13

negz commented 3 years ago

Just came here with what appears to be another use case for this (after being surprised that the RateLimiter added per https://github.com/kubernetes-sigs/controller-runtime/issues/631 only rate limits requeues, not queues). In my case there's a strong correlation between reconciles and API calls that may cost money, so we'd like to offer users an option to rate limit reconciles it seems that we can do this with something like @alvaroaleman's approach, but baked in support would be nice.

negz commented 3 years ago

Er, just ended up back here re-reading today and realised what I want isn't quite the same as what this issue is about but it still seems like a 'middleware' Reconciler (and/or a custom event Source) could do the trick.

CecileRobertMichon commented 2 years ago

Here is what we did in cluster-api-provider-azure to solve this: https://github.com/devigned/cluster-api-provider-azure/commit/b6b38b0e6f81b57e98f9eb933e3e6a9cbf2f5936

It would be great to include something like it in controller-runtime so all projects can use it.

cc @devigned

damdo commented 1 year ago

This would really be a useful addition for certain use cases, also considering this is something that various project are implementing as a wrapper.

Would this be something we are still interested in seeing upstreamed here in controller-runtime?

JoelSpeed commented 1 year ago

What's the best way to progress this issue? Should we discuss this on a community call? Is the design template still used within this community? Should we just copy one of the existing implementations (I reviewed both, the CAPZ one looks preferable IMO) into a PR and see how we go?

alvaroaleman commented 1 year ago

I meant to respond here, sorry. So a couple of points to consider: First off, how much demand for this actually exists? And will we be able to find a solution for this that will make everyone who commented in this issue happy?

If this turns out to be a more niche problem where we can not easily find a solution that works well for everyone, I'd prefer not to add anything in c-r for it. Controller-Runtime is intentionally factored in a way where you can solve problems like this downstream (correct me if you see reasons why that wouldn't be possible here).

JoelSpeed commented 1 year ago

First off, how much demand for this actually exists?

In terms of the demand, I think we have a few folks who've certainly hit issues like this, so there's some want from the community at the minimum. I can think of a few cases where issues might be solved by something in this space:

Where you get a lot of events that you map into a single object (eg a one to many relationship) where you don't need to necessarily reconcile the parent object many times in a short period
Where you are creating objects that you watch, we've seen examples where we "double create" because the second reconcile of the object happens so quickly the API isn't returning the newly created object (even with an uncached client)
Where you are watching objects that have a high event count, perhaps nodes or pods, and you only need to react every so often (the cluster autoscaler feels like a case here, they reconcile periodically because of the overwhelming number of events they would otherwise receive, perhaps with coalescing they could be more efficient?)
Where you have an external system with a rate limit, if you know each reconcile costs you x of your y quota per hour, you could use request coalescing to rate limit your reconciler

I appreciate some of these cases may be solved in other ways

If this turns out to be a more niche problem where we can not easily find a solution that works well for everyone, I'd prefer not to add anything in c-r for it. Controller-Runtime is intentionally factored in a way where you can solve problems like this downstream (correct me if you see reasons why that wouldn't be possible here).

The proposal of a middleware would mean that this wouldn't really affect the existing scope of the manager or reconciler interfaces, but instead just provide a utility that folks could opt into using if they decided they needed it. I think that fits the theme of people solving the issue downstream, but instead provides a common utility since it's a common issue, but I think this warrants further discussion about what we do and don't want to support within CR. I think it would be good to discuss this at a community call in the new year

vincepri commented 1 year ago

Where you are creating objects that you watch, we've seen examples where we "double create" because the second reconcile of the object happens so quickly the API isn't returning the newly created object (even with an uncached client)

This should probably be fixed by using a live client, what do you think? There is some opportunity for Controller Runtime built-in client to expose generally safer mechanics around GetOrCreate/Patch/etc

vincepri commented 1 year ago

The use cases above make sense to me, we could explore some different options where:

We have a stepping-stone implementation of sorts where a coalescing function is marked as alpha (w/ feature gates of sorts)
We should be clear in documentation and surrounding code comments that the first implementation is purely alpha.
The first step could be a simple coalescing function, which specifies the time window to coalesce by GVK

thoughts?

alvaroaleman commented 1 year ago

Where you are creating objects that you watch, we've seen examples where we "double create" because the second reconcile of the object happens so quickly the API isn't returning the newly created object (even with an uncached client)

That is an unrelated issue and just waiting is not a great approach to solve it. A better one is to use deterministic names so that the second create will fail if the object already exists.

The first step could be a simple coalescing function, which specifies the time window to coalesce by GVK

Sounds fine to me. Can we outline to external API and the expected behavior here though, to make sure ppl can check if works for them?

JoelSpeed commented 1 year ago

This should probably be fixed by using a live client, what do you think? There is some opportunity for Controller Runtime built-in client to expose generally safer mechanics around GetOrCreate/Patch/etc

So yeah, that's what I thought, we put a patch in to "double check" the result just before the create to make sure 100% using a live/uncached client, and we are still able to reproduce the bug, which is super weird and I'm pretty confused about, but it's been reproduced several times even with the uncached client in place, each time, a super hot reconcile, sub 10ms apart has been seen

That is an unrelated issue and just waiting is not a great approach to solve it. A better one is to use deterministic names so that the second create will fail if the object already exists.

That's not always possible, and not possible in the particular case I've been working on recently. Think about a controller like the replicaset controller, it maintains a number of pods, if it were observing this bug, it would create too many pods, but it can't use deterministic names. Our use case is similar, we have no option to use deterministic names, and I bet there are others out there that have no choice either (the cluster-api project for example, where creating machines can't use deterministic names)

alvaroaleman commented 1 year ago

That's not always possible, and not possible in the particular case I've been working on recently. Think about a controller like the replicaset controller, it maintains a number of pods, if it were observing this bug, it would create too many pods, but it can't use deterministic names.

That sounds like a good use-case for expectations just like the ReplicaSet controller uses them. The basic idea is to track your create/delete actions and only sync if your cache observed all of them.. This is something controller-runtime doesn't help you with today, but there is another issue for that.

justinsb commented 1 year ago

I encountered something simlar. The challenge that I see with the approaches that involve calling RequeueAfter is that I believe that RequeueAfter does not combine results by key, so we still end up with the same number of reconciliations, just spread out over more time.

My reasoning:

When we call RequeueAfter, that results in a call to AddAfter on DelayingInterface, which lands at https://github.com/kubernetes/client-go/blob/1517ffb8d37c99e6a3a2842bcdee0aa271f0332b/util/workqueue/delaying_queue.go#L287 . That is a simple list, not a map by types.NamespacedName.

Maybe I am missing something though?

I'm wondering if we need a better Queue implementation, one that is able to coalesce the same requests in the queue and/or debounce. AFAICT there's no way to set Queue/MakeQueue though - maybe a good first step would be to expose that so that controllers could prove that this is a win.

So maybe a +1 on https://github.com/kubernetes-sigs/controller-runtime/issues/857#issuecomment-1360324347 , and specifically a request to let controllers experiment with replacing the Queue.

sbueringer commented 1 year ago

cc @fabriziopandini @ykakarap @killianmuldoon (potentially interesting for ClusterAPI)

alvaroaleman commented 1 year ago

I'm wondering if we need a better Queue implementation, one that is able to coalesce the same requests in the queue and/or debounce. AFAICT there's no way to set Queue/MakeQueue though - maybe a good first step would be to expose that so that controllers could prove that this is a win.

Adding an option to replace the queue seems ok to me.

What exactly do you mean by coalesce the same requests in the queue - It should already de-duplicate identical items. Is what what you mean? Or are you referring to the AddAfter behavior? If so, IMHO a case can be made both for de-duplicating and not de-duplicating, thus it is unlikely the current behavior will be changed.

Debouncing should already be possible using RequeueAfter (admittedly not super elegant) or am I missing something? I linked something that does that in https://github.com/kubernetes-sigs/controller-runtime/issues/857#issuecomment-692998905

fabriziopandini commented 1 year ago

I don't know if feasible, but it will be ideal to have something like a priority queue, with resync events having a lower priority and other events having a higher priority. This could help when there are many objects of the same type and at every resync period there is a storm of events being added to the queue

alvaroaleman commented 1 year ago

@fabriziopandini agree about priority queue, we also have some use cases where that would be useful. However, that issue is IMHO orthogonal to this and has some added challenges like the fact that there is currently no upstream priority queue implementation. Would you mind opening a dedicated issue for a priority queue, please?

kubernetes-sigs / controller-runtime

Proposal: Add reconciliation buffer to reduce load #857