kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.1k stars 1.02k forks source link

Automatically pause downscaling based on defined metrics #5166

Open spereverziev opened 8 months ago

spereverziev commented 8 months ago

Proposal

Dynamically pause autoscaling based on condition. E.g pause when the metric requests pers seconds is less then X

Use-Case

I have a service A that calls service B. When A has an outage it stop sending requests to B and therefore B starts to downscaling all the way to min. When A comes back online it starts storming B that doesn't have enough replicas to handle the load which can take it down or significantly increase the time to recover.

So I want to be able to pause downscaling of B when service A has an outage. I can define this by using custom metrics query like rps of B dropped below X.

Is this a feature you are interested in implementing yourself?

No

Anything else?

No response

SpiritZhou commented 8 months ago

Could you provide a more detailed use case? In my opinion, if the metric requests per second are less than X, the pods should be scaled down, right? Alternatively, if the metric has some extreme value, you can use maxReplicaCount.

spereverziev commented 8 months ago

So the use case is the following. I have a service A that calls service B. When A has an outage it stop sending requests to B and therefore B starts to downscaling all the way to min. When A comes back online it starts storming B that doesn't have enough replicas to handle the load which can take it down or significantly increase the time to recover.

So I want to be able to pause downscaling of B when service A has an outage. I can define this by using custom metrics query like rps of B dropped bellow X.

I don't see how maxReplicasCount can help in my use case

spereverziev commented 8 months ago

@SpiritZhou I added more context ^, thanks

spereverziev commented 8 months ago

This is a very common use case that happens almost every month at my company

stale[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.