Question about scaling with AlertManager

rrrrover commented 5 years ago

My actions before raising this issue

[x] Followed the troubleshooting guide
[x] Read/searched the docs
[x] Searched past issues

Openfaas uses prometheus to monitor function calls, and when function QPS is higher than some threshold autoscale will be triggered.

But after functions are scaled up, the QPS won't go down, so functions will still be scaled up until maxReplicas are reached.

In my opinion, when we scale up functions, the QPS for each function replica will go down, it means the load for each replica will go down.

So when we scale function to X replicas where QPS/X is relatively small, we can stop scale up.

Also when the alert is stop, replicas will be set to minReplicas, QPS per replica will arise and probabily higher than we'd expect

Expected Behaviour

When APIHighInvocationRate alert is fired, function should only scale up to some scale not maxReplicas.
when APIHighInvocationRate is stopped, we should scale down function gracefully just like we scale up, little by little, to finnaly reach a safe QPS per replica

Current Behaviour

When APIHighInvocationRate alert keeps firing (function QPS is high), function replicas will soon reach maxReplicas (default 20)
When APIHighInvocationRate alert stops, function replica will drop to minReplicas (default 1)

Possible Solution

To solve scale up issue, we could change prometheus alert rule, use QPS/replicas. In my local test I use:

sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s]) / ignoring(code) gateway_service_count) > 5

To solve scale down issue, we could add a new scale-down endpoint in gateway and add a new prometheus rule to invoke scale-down api when replicas are more than we want

Steps to Reproduce (for bugs)

start minikube, deploy faas-netes and deploy some functions for future test.
invoke function 5+ times per second, I use hey to invoke curl function 6 times per second.

hey -m POST -q 6 -c 1 -d http://some-test-service:8080/ -z 30m http://192.168.99.100:31112/function/curl

kubectl logs -f deploy/gateway -c gateway -n openfaas| grep Scale to watch scale up/down logs

rrrrover commented 5 years ago

Screenshot from 2019-07-24 12-32-21

rrrrover commented 5 years ago

I the picture I shared above, we could see curl function will be scaled up every 40 seconds, according to the default alertmanager settings. And after I stopped the funtion call, function replicas drop to 1 immediately

alexellis commented 5 years ago

Hi @rrrrover, thanks for your interest in the auto-scaling.

I think you've described how the AlertManager option works reasonably well. It's not the only option and this is customisable.

If you are not satisfied with the default auto-scaling for your use-case, you can edit it:

1) OpenFaaS has an open REST API which you could use to implement your own autoscaling algorithm or controller

2) You can use the HPAv2 rules in Kubernetes.

HPAv2 would allow you to use either CPU, memory, or custom metrics i.e. QPS (see the metrics gathered from the watchdog / function for this option)

3) You could edit the AlertManager rules for scaling up

As you identified, scaling down to min replicas corresponds to a resolved alert from AlertManager. I am not sure how much you can expect to edit that experience whilst retaining that semantic.

You can edit the AlertManager rules for scaling up, and that's something I've seen other users doing too. I would suggest you try out your sample PromQL and report back on how it compares for your use-case.

Looking forward to hearing from you soon,

Alex

alexellis commented 5 years ago

-- Join Slack to connect with the community https://docs.openfaas.com/community

rrrrover commented 5 years ago

Hi @alexellis , thanks for the reply and the patient guidance.

My use case was inspired by HPAv2 rules in k8s. HPAv2 rule will ensure each function pod can only use limited resources of the cluster. In my understanding, each function pod should also handle limited requests per second.

That's why I observe QPS per pod not QPS total in prometheus.

I've tried my new PromQL which fires an alert when each pod handles over 5 requests per second

sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s]) / ignoring(code) gateway_service_count) > 5

I send 6 requests to the function pod every second, so it will scale up to 5 pods to resolve the alert.

Screenshot from 2019-07-31 09-27-37

And I found that when replica finally reaches a desired number, the alert resolved and pods were scaled down to 1. And then alert fired again.

Screenshot from 2019-07-31 09-27-43

So my propose to scale down by a new prometheus alert is to solve this infinite loop.

We could still observe the QPS per pod, but this time we should pick the threshold carefully so after scale down QPS per pod will not trigger scale-up again.

In this example above, we could scale down with step of 4 pods (20%*maxReplicas) when QPS per pod is less than 1. So QPS(6) / replicas(5) > 1, no scale down triggered, replicas are stable

rrrrover commented 5 years ago

OpenFaaS has an open REST API which you could use to implement your own autoscaling algorithm or controller

By this, do you mean the /system/scale-function/{functionname} api? This api seems helpful, I can build up my own controller to trigger this api to scale up/down.

My use case is not a real world request, I was just studying openfaas and thought about the auto-scaling. If this is not openfaas main focus right now, I can close this issue.

BTW I joined the community days ago, very willing to contribute :D

alexellis commented 5 years ago

Hi @rrrrover,

I think you have a valid point and I'd like to see how far you can push AlertManager. It may require a separate Go process similar faas-idler to make sure that the scale-up/down is not orthogonal.

What's your name on Slack?

rrrrover commented 5 years ago

Hi @alexellis , my name is also rrrrover on slack

alexellis commented 5 years ago

@rrrrover would you also be interested in working on this issue? https://github.com/openfaas/faas-netes/issues/483

rrrrover commented 5 years ago

@alexellis thank you for your trust, I'd like to work on that issue too.

rrrrover commented 5 years ago

Hi @alexellis , I've created a project faas-autoscaler to do autoscaling for openfaas. Would you mind to take some time to have a look at it? It has some problems in secret binding, but for autoscaling it works just fine, I'll keep improving it.

Currently I use two prometheus rules, one for scale up and one for scale down. Each time when scale up/down, replica will increase/decrease by deltaReplica until it reaches the limit

deltaReplica = maxReplicas * scalingFactor

Now faas-autoscaler can scale up/down functions normally. I'll do some math to find proper QPS threshold for scale up/down later

rrrrover commented 5 years ago

Hi @alexellis , it's been a while since our last talk. I've updated my faas-autoscaler project. Now faas-autoscaler can control replicas by setting only one prometheus rule:

- alert: APIInvoke
  expr: rate(gateway_function_invocation_total[10s]) / ignoring(code) gateway_service_count >= 0
  for: 5s
  labels:
    service: gateway
    severity: major
    action: auto-scale
    target: 2
    value: "{{ $value }}"
  annotations:
    description: Function invoke on {{ $labels.function_name }}
    summary: Function invoke on {{ $labels.function_name }}

With this rule set, faas-autoscaler will know the desired metric for each function replica, defined the label target: 2. faas-autoscaler will also know current metric , i.e value: "{{ $value }}". Then faas-autoscaler will calculate the desired replicas:

desiredReplicas = ceil[currentReplicas * ( value / target )]

As the rule expr is always true, alert will keeps firing, so faas-autoscaler will act like it's checking function replicas periodically (every 40 seconds)

lmxia commented 5 years ago

How about just simply scale down from current replicas to " currentReplicas - math.Ceil(currentReplicas * scalingFactor)" when resolved event received. Then we need no scale down endpoint.

rrrrover commented 5 years ago

Hi @lmxia , thanks for the tips. I've improved faas-autoscaler a little, it uses only one endpoint /system/auto-scale. Because now we know the desired metrics for function and current value, so we could easily calculate the desired replicas using:

desiredReplicas = ceil[currentReplicas * ( value / target )]

I'm still keeping the "old" faas-autoscaler endpoints /system/scale-up and /system/scale-down. If anyone would in the old way, they should use them both to make autoscale work, I'll provide an example for you.

Let's assume we need to autoscale functions according to the RPS(request-per-second) for each replica, we want RPS in range [50, 100].

When the system receives 1000 function calls per second, the optimal replicas are 10. With the old config set, we will scale up step by step, bring the replicas to 10. And when the system RPS drop to only 100, we should scale down to 1 replica, step by step from receiving scale-down alert.

If we only scale down when the scale-up alert resolves, then we either scale down to minReplica, which leads me to open this issue, or like your suggestion, scale down for a little step math.Ceil(currentReplicas x scalingFactor), which will lead to a waste of resources.

alexellis commented 5 years ago

I think this would be a good topic for the next community call, would you be interested in presenting your scaler @rrrrover ?

rrrrover commented 5 years ago

Hi @alexellis , thanks for this opportunity. But first I want to know when is the community call? Because I'm in China, we have 9 hours jet lag, I might not have time to join.

alexellis commented 5 years ago

Thank you for your work on this

kevin-lindsay-1 commented 4 years ago

So when I invoke a long-running process, and it takes a few seconds to give a response (thereby using gateway_function_invocation_total), autoscaling currently increases my count of nodes, but only upon completion (and therefore lags behind the current queued workload).

Similarly, once a burst functions complete, the function scales up, and then back down (by the looks of it, while the function is running) because not enough have completed in the last 5 seconds.

My initial thought is to alter the alert rule to take into account gateway_function_invocation_started, and then from there compare it to gateway_function_invocation_total.

That said, it might simply be more appropriate to calculate a new metric specifically for currently running invocations, and then providing (or calculating) a number of invocations that a particular pod should be able to handle concurrently.

As it stands, autoscaling doesn't really appear to work for longer running (on the order of a minute or two per invocation) functions, because it rubberbands the scaling size based on recent completed invocations, not current invocations.

I'm currently experimenting with a slightly altered alert rule of something like:

sum by (function_name) (
  gateway_function_invocation_started - 
  ignoring (code) gateway_function_invocation_total{code="200"} -
  ignoring (code) gateway_function_invocation_total{code="500"}
)

Apologies for the less-than optimal query, not super experienced with promql.

I see there's some documentation about it being able to be set via ConfigMap, but not really sure what that example should look like. Digging around for that.

kevin-lindsay-1 commented 3 years ago

Is this issue still active? There hasn't been any activity in a year.

openfaas / faas