openfaas / faas

OpenFaaS - Serverless Functions Made Simple
https://www.openfaas.com
MIT License
25.02k stars 1.93k forks source link

Question on how to scale the queue-worker #1550

Closed hwuerz closed 4 years ago

hwuerz commented 4 years ago

My actions before raising this issue

Hi,

I have many requests to an async function, more than can be processed in real time. As a result, the queue fills up. Now I want the function and especially the queue-workers to scale with the size of the queue.

As background: My function has a fixed max_inflight value and is only called asynchronously (/async-function/my-function). By scaling queue-worker and the function together I can avoid unnecessary HTTP 429 errors. If a combined scaling does not work, an independent scaling of the queue-worker based on the queue length would help me too.

Is this possible with OpenFaaS on Kubernetes?

What I found so far: https://github.com/openfaas/faas/issues/671 A combination of function scaling and queue-worker scaling was mentioned but not elaborated on. https://github.com/openfaas/faas/issues/1479 The result was that the queue-worker should be used for many requests, their scaling was not discussed. https://github.com/openfaas/workshop/blob/master/lab7.md Lab 7 in the workshop only deals with the queue-worker logs, but not with their scaling. https://docs.openfaas.com/reference/async/ The docs say that the queue workers should be scaled for higher concurrency, but not how.

Thanks for OpenFaaS, it' s a great framework!

Expected Behaviour

OpenFaaS scales the queue workers for async function calls based on the queue size. Ideally, the function is scaled accordingly.

Current Behaviour

The queue workers are not scaled. Neither automatically, nor based on the queue size.

Possible Solution

Steps to Reproduce (for bugs)

  1. Create a function that takes a few seconds to finish.
  2. Call the function via /async-function/my-function more often than the requests can be processed.
  3. The NATS queue fills.
  4. The queue workers and the function do not scale as a result of the increasing queue size.

Context

I have many function calls that take some time in total, but should be finished as soon as possible. Therefore OpenFaaS should scale the queue worker and the function based on the queue size.

Your Environment

Server: Docker Engine - Community Engine: Version: 19.03.12 API version: 1.40 (minimum version 1.12) Go version: go1.13.10 Git commit: 48a66213fe Built: Mon Jun 22 15:44:07 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.2.13 GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit: fec3683

If using Swarm, run the following and include the output:

curl -sLS https://raw.githubusercontent.com/moby/moby/master/contrib/check-config.sh | bash

Next steps

You may join Slack for community support.

alexellis commented 4 years ago

Hi, are you part of the Cognite team as this sounds very familiar to what I've been helping @andeplane set up?

Alex

alexellis commented 4 years ago

/set title: Question on how to scale the queue-worker

hwuerz commented 4 years ago

Hi Alex, thanks for your prompt response!

No, unfortunately I have nothing to do with Cognite. Is there any way I can find the results of your project?

Thanks, Hendrik

alexellis commented 4 years ago

So there are a few ways this has been solved before, or that it could be approached:

1) Add an autoscaling rule on the 429 HTTP code - this is easy to do, just update the Prometheus / Alertmanager config - Cognite did this - https://github.com/openfaas/faas-netes/blob/master/chart/openfaas/templates/prometheus-cfg.yaml#L65 2) Add a HPAv2 rule for the queue-worker - base it upon CPU or memory, it's up to you which - then it will autoscale (see the example we provide for functions, you can copy it) 3) Update the parallelism of the queue-worker, you will see that it can be set to a massive number for a single process, there is no longer any need to scale the queue-worker. https://docs.openfaas.com/reference/async/#parallelism 4) If you have slow functions that block a queue - you can use multiple named queues - https://docs.openfaas.com/reference/async/#multiple-queues

Re: "scaling functions" - this is automatic and built-in, so there is no need to request this. If however you are finding the scaling rule is not working for your use-case, you can:

1) Use HPAv2 instead with memory or CPU and we provide a tutorial for that - https://docs.openfaas.com/tutorials/kubernetes-hpa/ 2) Use HPAv2 with custom metrics that you emit - https://docs.openfaas.com/tutorials/kubernetes-hpa-custom-metrics/

If you feel you need to see retries, then you could do this by using Linkerd - it can configure a retry budget for services - https://github.com/openfaas-incubator/openfaas-linkerd2 - https://linkerd.io/2/tasks/configuring-retries/

hwuerz commented 4 years ago

The function scaling based on the 429 HTTP errors sounds like a good idea. I'll do that. And if the queue-workers can handle so many concurrent connections, then I won't scale them either. I will simply set their parallelism to max_inflight of the function * max scaling of the function. That way, with maximum scaling of the function, no more unnecessary 429 should come back.

Linkerd sounds exciting. I will have a look at it

Thanks again, you' re great!

alexellis commented 4 years ago

Glad to help.

You can get insights like this regularly from my Insiders Updates emails which includes updates on OpenFaaS, Kubernetes and cloud computing. https://insiders.openfaas.io - some tiers include the complete set of back issues that covers the queue-worker in more detail.

You may also want to join Slack to chat with the community. Link in the docs.