knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.54k stars 1.15k forks source link

Delegate HPA management to KEDA #14877

Closed skonto closed 4 months ago

skonto commented 7 months ago

Describe the feature

Knative's autoscaler-hpa manages hpa object on behalf of the user. For custom metrics users need to setup Prometheus and Prometheus adapter to register the metrics with the corresponding K8s api (example). However Prometheus adapter is under maintenance upstream and KEDA among others supports hpa based autoscaling. Given the popularity of KEDA for pull model, event based autoscaling delegating the hpa management to KEDA would help users to only deal with KEDA or Knative resources and not K8s low level resources. That means Knative autoscaler-hpa controller could instead create scaledObjects on behalf of the user that map to a K8s HPA object managed by KEDA. This idea could be implemented as a Knative extension. Here is a poc of how this looks in practice as part of the current autoscaler-hpa component.

cc @dprotaso @ReToCode

yuzisun commented 7 months ago

We are interested in this too, as the request driven is not necessarily good for scaling LLM inference services, we need to scale based on token metrics.

skonto commented 7 months ago

I moved the code in a separate repo here, now it runs a standalone component. I will iterate on it and probably move it to knative-extensions.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

skonto commented 4 months ago

This has been moved into a new repo closing.