Open rrrrover opened 5 years ago
I the picture I shared above, we could see curl function will be scaled up every 40 seconds, according to the default alertmanager settings. And after I stopped the funtion call, function replicas drop to 1 immediately
Hi @rrrrover, thanks for your interest in the auto-scaling.
I think you've described how the AlertManager option works reasonably well. It's not the only option and this is customisable.
If you are not satisfied with the default auto-scaling for your use-case, you can edit it:
1) OpenFaaS has an open REST API which you could use to implement your own autoscaling algorithm or controller
2) You can use the HPAv2 rules in Kubernetes.
HPAv2 would allow you to use either CPU, memory, or custom metrics i.e. QPS (see the metrics gathered from the watchdog / function for this option)
3) You could edit the AlertManager rules for scaling up
As you identified, scaling down to min replicas corresponds to a resolved alert from AlertManager. I am not sure how much you can expect to edit that experience whilst retaining that semantic.
You can edit the AlertManager rules for scaling up, and that's something I've seen other users doing too. I would suggest you try out your sample PromQL and report back on how it compares for your use-case.
Looking forward to hearing from you soon,
Alex
-- Join Slack to connect with the community https://docs.openfaas.com/community
Hi @alexellis , thanks for the reply and the patient guidance.
My use case was inspired by HPAv2 rules in k8s. HPAv2 rule will ensure each function pod can only use limited resources of the cluster. In my understanding, each function pod should also handle limited requests per second.
That's why I observe QPS per pod not QPS total in prometheus.
I've tried my new PromQL which fires an alert when each pod handles over 5 requests per second
sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s]) / ignoring(code) gateway_service_count) > 5
I send 6 requests to the function pod every second, so it will scale up to 5 pods to resolve the alert.
And I found that when replica finally reaches a desired number, the alert resolved and pods were scaled down to 1. And then alert fired again.
So my propose to scale down by a new prometheus alert is to solve this infinite loop.
We could still observe the QPS per pod, but this time we should pick the threshold carefully so after scale down QPS per pod will not trigger scale-up again.
In this example above, we could scale down with step of 4 pods (20%*maxReplicas) when QPS per pod is less than 1. So QPS(6) / replicas(5) > 1, no scale down triggered, replicas are stable
OpenFaaS has an open REST API which you could use to implement your own autoscaling algorithm or controller
By this, do you mean the /system/scale-function/{functionname}
api? This api seems helpful, I can build up my own controller to trigger this api to scale up/down.
My use case is not a real world request, I was just studying openfaas and thought about the auto-scaling. If this is not openfaas main focus right now, I can close this issue.
BTW I joined the community days ago, very willing to contribute :D
Hi @rrrrover,
I think you have a valid point and I'd like to see how far you can push AlertManager. It may require a separate Go process similar faas-idler to make sure that the scale-up/down is not orthogonal.
What's your name on Slack?
Hi @alexellis , my name is also rrrrover on slack
@rrrrover would you also be interested in working on this issue? https://github.com/openfaas/faas-netes/issues/483
@alexellis thank you for your trust, I'd like to work on that issue too.
Hi @alexellis , I've created a project faas-autoscaler to do autoscaling for openfaas. Would you mind to take some time to have a look at it? It has some problems in secret binding, but for autoscaling it works just fine, I'll keep improving it.
Currently I use two prometheus rules, one for scale up and one for scale down. Each time when scale up/down, replica will increase/decrease by deltaReplica until it reaches the limit
deltaReplica = maxReplicas * scalingFactor
Now faas-autoscaler can scale up/down functions normally. I'll do some math to find proper QPS threshold for scale up/down later
Hi @alexellis , it's been a while since our last talk. I've updated my faas-autoscaler project. Now faas-autoscaler can control replicas by setting only one prometheus rule:
- alert: APIInvoke
expr: rate(gateway_function_invocation_total[10s]) / ignoring(code) gateway_service_count >= 0
for: 5s
labels:
service: gateway
severity: major
action: auto-scale
target: 2
value: "{{ $value }}"
annotations:
description: Function invoke on {{ $labels.function_name }}
summary: Function invoke on {{ $labels.function_name }}
With this rule set, faas-autoscaler will know the desired metric for each function replica, defined the label target: 2
. faas-autoscaler will also know current metric , i.e value: "{{ $value }}"
.
Then faas-autoscaler will calculate the desired replicas:
desiredReplicas = ceil[currentReplicas * ( value / target )]
As the rule expr is always true, alert will keeps firing, so faas-autoscaler will act like it's checking function replicas periodically (every 40 seconds)
How about just simply scale down from current replicas to " currentReplicas - math.Ceil(currentReplicas * scalingFactor)" when resolved event received. Then we need no scale down endpoint.
Hi @lmxia , thanks for the tips. I've improved faas-autoscaler a little, it uses only one endpoint /system/auto-scale
. Because now we know the desired metrics for function and current value, so we could easily calculate the desired replicas using:
desiredReplicas = ceil[currentReplicas * ( value / target )]
I'm still keeping the "old" faas-autoscaler endpoints /system/scale-up
and /system/scale-down
.
If anyone would in the old way, they should use them both to make autoscale
work, I'll provide an example for you.
Let's assume we need to autoscale functions according to the RPS(request-per-second) for each replica, we want RPS in range [50, 100].
When the system receives 1000 function calls per second, the optimal replicas are 10. With the old config set, we will scale up step by step, bring the replicas to 10. And when the system RPS drop to only 100, we should scale down to 1 replica, step by step from receiving scale-down
alert.
If we only scale down when the scale-up
alert resolves, then we either scale down to minReplica, which leads me to open this issue, or like your suggestion, scale down for a little step math.Ceil(currentReplicas x scalingFactor), which will lead to a waste of resources.
I think this would be a good topic for the next community call, would you be interested in presenting your scaler @rrrrover ?
Hi @alexellis , thanks for this opportunity. But first I want to know when is the community call? Because I'm in China, we have 9 hours jet lag, I might not have time to join.
Thank you for your work on this
So when I invoke a long-running process, and it takes a few seconds to give a response (thereby using gateway_function_invocation_total
), autoscaling currently increases my count of nodes, but only upon completion (and therefore lags behind the current queued workload).
Similarly, once a burst functions complete, the function scales up, and then back down (by the looks of it, while the function is running) because not enough have completed in the last 5 seconds.
My initial thought is to alter the alert rule to take into account gateway_function_invocation_started
, and then from there compare it to gateway_function_invocation_total
.
That said, it might simply be more appropriate to calculate a new metric specifically for currently running invocations, and then providing (or calculating) a number of invocations that a particular pod should be able to handle concurrently.
As it stands, autoscaling doesn't really appear to work for longer running (on the order of a minute or two per invocation) functions, because it rubberbands the scaling size based on recent completed invocations, not current invocations.
I'm currently experimenting with a slightly altered alert rule of something like:
sum by (function_name) (
gateway_function_invocation_started -
ignoring (code) gateway_function_invocation_total{code="200"} -
ignoring (code) gateway_function_invocation_total{code="500"}
)
Apologies for the less-than optimal query, not super experienced with promql.
I see there's some documentation about it being able to be set via ConfigMap
, but not really sure what that example should look like. Digging around for that.
Is this issue still active? There hasn't been any activity in a year.
My actions before raising this issue
Openfaas uses prometheus to monitor function calls, and when function QPS is higher than some threshold autoscale will be triggered.
But after functions are scaled up, the QPS won't go down, so functions will still be scaled up until maxReplicas are reached.
In my opinion, when we scale up functions, the QPS for each function replica will go down, it means the load for each replica will go down.
So when we scale function to X replicas where QPS/X is relatively small, we can stop scale up.
Also when the alert is stop, replicas will be set to minReplicas, QPS per replica will arise and probabily higher than we'd expect
Expected Behaviour
When APIHighInvocationRate alert is fired, function should only scale up to some scale not maxReplicas.
when APIHighInvocationRate is stopped, we should scale down function gracefully just like we scale up, little by little, to finnaly reach a safe QPS per replica
Current Behaviour
When APIHighInvocationRate alert keeps firing (function QPS is high), function replicas will soon reach maxReplicas (default 20)
When APIHighInvocationRate alert stops, function replica will drop to minReplicas (default 1)
Possible Solution
sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s]) / ignoring(code) gateway_service_count) > 5
Steps to Reproduce (for bugs)
hey -m POST -q 6 -c 1 -d http://some-test-service:8080/ -z 30m http://192.168.99.100:31112/function/curl
kubectl logs -f deploy/gateway -c gateway -n openfaas| grep Scale
to watch scale up/down logs