knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.46k stars 1.14k forks source link

Implement intelligent autoscaling #15068

Open skonto opened 3 months ago

skonto commented 3 months ago

Describe the feature

KPA gathers statistics via a moving average across pod replicas given a time window. I am wondering if we could provide something smarter and also deal with some cold start issues eg. don't scale down to zero if a traffic burst is about to happen. scale-down-delay keeps around the maximum desired pod count within a window but probably we need to look ahead in time to make sure we have enough capacity as pods may take time to scale out (depends on the app), affecting latency. This could be implemented as knative-extension as Knative services could be updated externally (no need to change kpa). There is a lot of history on the topic, see [1] for more. This feature is already offered, for example at the node level, by cloud providers, see [2]. See also the KEDA related issue [3]. I am creating this issue also as a ref for future discussions in case there is interest from the community.

Refs

[1] Lucia Schuler, Somaya Jamil, Niklas Kühl, AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments. [2] Predictive scaling for Amazon EC2 Auto Scaling [3] https://github.com/kedacore/keda/issues/2401

cc @dprotaso @ReToCode

Hojland commented 2 months ago

We also experience the issues mentioned here. I was initially hoping to integrate some redundancy option, so that I could always add x pods to the deployment on top of what kpa predicts. But I would much rather like some predictive scaling or options for also integrating cyclical workloads or similar.

As a first step for me, could I integrate this redundancy as a knative-extension and deploy it myself? Are there guides for doing that?

Help is much appreciated!

Lightxyz commented 2 months ago

@Hojland You can implement your own Autoscaling algorithm in Knative, then just recompile it and deploy the different Autoscaler container image.