Closed skonto closed 4 months ago
We are interested in this too, as the request driven is not necessarily good for scaling LLM inference services, we need to scale based on token metrics.
I moved the code in a separate repo here, now it runs a standalone component. I will iterate on it and probably move it to knative-extensions.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.
This has been moved into a new repo closing.
Describe the feature
Knative's autoscaler-hpa manages hpa object on behalf of the user. For custom metrics users need to setup Prometheus and Prometheus adapter to register the metrics with the corresponding K8s api (example). However Prometheus adapter is under maintenance upstream and KEDA among others supports hpa based autoscaling. Given the popularity of KEDA for pull model, event based autoscaling delegating the hpa management to KEDA would help users to only deal with KEDA or Knative resources and not K8s low level resources. That means Knative autoscaler-hpa controller could instead create scaledObjects on behalf of the user that map to a K8s HPA object managed by KEDA. This idea could be implemented as a Knative extension. Here is a poc of how this looks in practice as part of the current autoscaler-hpa component.
cc @dprotaso @ReToCode