Explore scale to/from zero

skonto commented 5 months ago

Keda allows scaling to zero when custom metrics are used, see here. Although there is no activator on the path and no request holding, this could be useful.

skonto commented 4 months ago

The big issue here is going up from zero to one when metrics come from the pod itself. Since we don't have activator we will have to utilize ingress metrics and probably use the combination metrics feature provided by KEDA. I will explore. For the case that metrics come from an external source this should work out of the box?

skonto commented 4 months ago

I explored this. Knative requires to detect private endpoints before it can set sks endpoints ready. If there are no endpoints then the traffic configuration will fail and there will be no child resources like ingresses created. If there are no ingresses that means we cant get any metrics from Kourier or Istio as there is no networking setup for the targeted service. We can bypass that with some minor changes in Serving (needs proper PR and testing) and by setting the pa status to targetInitialized from within the autoscaler-keda, if current replicas in hpa is 0. PA status is propagated to the revision and so we need the revision set to ready for things to work.

After we apply the above changes we can easily use Kourier metrics or any other external to the app pod metrics (you need a service monitor to expose them) such as: sum(avg_over_time(envoy_cluster_upstream_rq_active{envoy_cluster_name='test/metrics-test-00001'}[1m])) to detect incoming requests and trigger scaling from zero to N. I tested this on minikube and worked fine (treating a ksvc like a raw deployment at the end of the day)! Just to be clear scaling from zero works with custom metrics (KEDA + HPA +Prometheus), but not with cpu, mem (due to HPA restrictions afaik).

In the future this could be combined with multiple triggers (see issue #20) and metrics from the targeted service to scale the latter effectively. One trigger should be the signal for scaling from zero and the rest for the normal scaling 1->N. The downside is you don't get any backpressure or request holding and until an instance is up you get failures. That could be ok for apps that have a retry logic. Note here that for Istio there is no exposed metric measuring in flight requests as istio_requests_total only shows completed ones (https://github.com/istio/istio/issues/23672). cc @ReToCode @rhuss This is might be an interesting path to explore further in the future, although for now we should stick to minReplicas>=1.

skonto commented 4 months ago

cc @dprotaso wdyth about the changes needed to support scale from zero for hpa with custom metrics (via keda)? Would that be too intrusive?

skonto commented 3 months ago

I managed to make this work without modifying Serving for the scenario where we don't rely on pod metrics.

$k get hpa -n test
NAME                   REFERENCE                                    TARGETS             MINPODS   MAXPODS   REPLICAS   AGE
metrics-test-00001     Deployment/metrics-test-00001-deployment     <unknown>/5 (avg)   1         10        0          8m
metrics-test-2-00001   Deployment/metrics-test-2-00001-deployment   7433m/5 (avg)       1         10        1          4m43s
$j get hpa -n test
NAME                   REFERENCE                                    TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
metrics-test-00001     Deployment/metrics-test-00001-deployment     7433m/5 (avg)   1         10        1          8m2s
metrics-test-2-00001   Deployment/metrics-test-2-00001-deployment   7433m/5 (avg)   1         10        1          4m45s
$k  get hpa -n test
NAME                   REFERENCE                                    TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
metrics-test-00001     Deployment/metrics-test-00001-deployment     7433m/5 (avg)   1         10        1          8m3s
metrics-test-2-00001   Deployment/metrics-test-2-00001-deployment   7433m/5 (avg)   1         10        1          4m46s
$k  get po -n test
NAME                                              READY   STATUS    RESTARTS   AGE
metrics-test-00001-deployment-78fcf58794-5mdsf    2/2     Running   0          9s
metrics-test-00001-deployment-78fcf58794-szr58    2/2     Running   0          8s
metrics-test-2-00001-deployment-8d44b677f-dh7rx   2/2     Running   0          4m52s
metrics-test-2-00001-deployment-8d44b677f-fm2pb   2/2     Running   0          21s
$k get ksvc -n test
NAME             URL                                      LATESTCREATED          LATESTREADY            READY   REASON
metrics-test     http://metrics-test.test.example.com     metrics-test-00001     metrics-test-00001     True    
metrics-test-2   http://metrics-test-2.test.example.com   metrics-test-2-00001   metrics-test-2-00001   True    

$k get hpa -n test
NAME                   REFERENCE                                    TARGETS             MINPODS   MAXPODS   REPLICAS   AGE
metrics-test-00001     Deployment/metrics-test-00001-deployment     <unknown>/5 (avg)   1         10        0          14m
metrics-test-2-00001   Deployment/metrics-test-2-00001-deployment   0/5 (avg)           1         10        1          10m
$k get po -n test
 READY   STATUS        RESTARTS   AGE
metrics-test-00001-deployment-78fcf58794-287rv    1/2     Terminating   0          5m44s
metrics-test-00001-deployment-78fcf58794-2wzld    1/2     Terminating   0          5m44s
metrics-test-00001-deployment-78fcf58794-5mdsf    1/2     Terminating   0          6m
metrics-test-00001-deployment-78fcf58794-6qpnf    1/2     Terminating   0          5m29s
metrics-test-00001-deployment-78fcf58794-7pm45    1/2     Terminating   0          5m29s
metrics-test-00001-deployment-78fcf58794-j6wb2    1/2     Terminating   0          5m29s
metrics-test-00001-deployment-78fcf58794-q5n7h    1/2     Terminating   0          5m29s
metrics-test-00001-deployment-78fcf58794-rmxxr    1/2     Terminating   0          5m14s
metrics-test-00001-deployment-78fcf58794-szr58    1/2     Terminating   0          5m59s
metrics-test-00001-deployment-78fcf58794-vkl7s    1/2     Terminating   0          5m14s
metrics-test-2-00001-deployment-8d44b677f-2xxcl   1/2     Terminating   0          5m41s
metrics-test-2-00001-deployment-8d44b677f-4x52t   1/2     Terminating   0          5m11s
metrics-test-2-00001-deployment-8d44b677f-5gck2   1/2     Terminating   0          5m42s
metrics-test-2-00001-deployment-8d44b677f-cntm5   2/2     Terminating   0          5m26s
metrics-test-2-00001-deployment-8d44b677f-dh7rx   2/2     Running       0          10m
metrics-test-2-00001-deployment-8d44b677f-fm2pb   1/2     Terminating   0          6m12s
metrics-test-2-00001-deployment-8d44b677f-kvlrp   2/2     Terminating   0          5m26s
metrics-test-2-00001-deployment-8d44b677f-ssf6t   2/2     Terminating   0          5m26s
metrics-test-2-00001-deployment-8d44b677f-v4qhd   2/2     Terminating   0          5m26s
metrics-test-2-00001-deployment-8d44b677f-x2gb5   1/2     Terminating   0          5m11s

Above I am using sum(rate(http_requests_total{namespace=\"test\"}[1m])) as the prometheus query for both ksvcs. metrics-test-2 has minReplica=1 while 1metrics-test1 has minReplica=0. I am using metrics-test-2 with min replica equal to 1 so I can run requests against it (ksvc is ready when minReplicas=1).

metrics-test is not ready as there is no replica and no ingress is set as a result of that. Once the http_requests_total is increased due to the requests sent to metrics-test-2, metrics-test ksvc also gets scaled (Knative does the networking setup). I did this on purpose to emulate external metrics triggering scale from zero. So that means we can do the scaling from zero with KEDA. Also we can do downscaling again (KEDA does the trick by making HPA inactive when sum(rate(http_requests_total...))) returns 0 ). This also solves the issue with inactive revisions not being scaled down when using the core Serving HPA autoscaler.

I will do a PR.

knative-extensions / autoscaler-keda

Explore scale to/from zero #15