[💡FEATURE REQUEST]: Adaptive scaling of the workers

stefanos82 commented 5 years ago

With PHP-FPM we have three options: static, dynamic, and ondemand.

Can we accomplish such thing with rr? I don't think it makes sense to waste resources when a website is more or less in idle mode; It should be able to limits its workers to the lowest level possible for obvious reasons.

Thoughts and / or suggestions?

Alex-Bond commented 5 years ago

@stefanos82 hi! The idea of RR is that we keep in memory as much code as possible and don't have to bootstrap system on each request. PHP-FPM working differently. They spawning workers but didn't do anything until you make the request and after it destroys worker. From my perspective, it doesn't make sense to create a dynamic amount of workers because it will kill mail purpose of the system.

wolfy-j commented 5 years ago

This is possible and has been planned from the beginning (that's why worker pool has name StaticPool and interfaced for usage in Server).

The only trick in this feature is to properly define scaling logic to push/pull workers from allocation channel. We are very actively talking about this feature internally, cos if it's done wrong effect on application can be very harmful.

stefanos82 commented 5 years ago

Hmmm, I see.

Well, since we care about performance, maybe we should have figured the number of workers based on CPU cores? That's also an option I would say.

package main

import (  
    "fmt"
    "runtime"
)

func main() {
    fmt.Println(runtime.NumCPU())
}

wolfy-j commented 5 years ago

I can only set it as default value for the pool.numWorkers option.

stefanos82 commented 5 years ago

OK, what if I would like to increase or decrease the number of CPU workers on the fly. Is there any hotkey for this option?

wolfy-j commented 5 years ago

Currently this API is not exposed, but it is possible to configure pool with different configuration: https://github.com/spiral/roadrunner/blob/master/server.go#L113

stefanos82 commented 5 years ago

You mean to use Reconfigure...when exactly? I'm referring at runtime.

If for instance, I run rr but I realize after a while that my traffic needs more workers.

Am I forced to stop it, increase the number for workers in .rr.yaml, and then restart it?

wolfy-j commented 5 years ago

Currently yes, Reconfigure is what used in http:reset which you call in runtime (without stopping the server). I guess I can add a flag to alter number of workers for this function.

stefanos82 commented 5 years ago

That would be more than awesome.

wolfy-j commented 5 years ago

After couple of intense internal discussions (thank you @ValeryPiashchynski, Andrew M, Alexei N, and @vvval ) we have come up with a plan of adding basic balancing mechanism based on 2 derivative metrics - allocation time and processing time. Thought, more metrics can be added in a future, this two should cover a lot of possible use cases such as a lot of fast queries, few amount of large queries and so on.

If anyone have anything to share regarding adaptive scaling mechanism algorithms we are glad to listen.

stefanos82 commented 5 years ago

It would be a lot helpful if you could expand more on this, much like a case study, what led you to choose these two derivative metrics, and so forth.

I could investigate it and see whether there is a better alternative that could be applied.

wolfy-j commented 5 years ago

Well, cost they both derivative :) Each metric depends on CPU load, number of connections, processing time and etc.

allocation time = how long you have to wait to get free worker (window average).
processing time = hold long you have to wait to get your job done (window average).

In theory, even one of this metrics should include enough information to scale system up and down.

I will try to explain couple of scenarios (green = processing time, orange = allocation time):

1) processing time is high but allocation time is low system accepting heavy requests on low/medium rate, no need to scale

2) processing time is high, allocation time is growing system is accepting more (number of) heavy requests than previously, it's good time to scale up

3) processing time is high, allocation time is high system is accepting heavy requests on high rate. System can only scale here if CPU/memory is available, otherwise the system is saturated.

-- please do not consider this whole chart as the timeline for the app, it's an example ---

4) processing time is low, allocation time is high system is accepting a lot of small requests on a high rate, we can scale up if CPU/memory is available.

5) low processing time, low allocation time you are running hello world application. :)

This is not final, we are still having the discussions and open to suggestions, this is our first shoot (before the implementation). I'm ready to accept that we have the fatal flaw in this logic, however, this metrics are easy to retrieve and process, so they looks promising for first version of adaptive scaling mechanism.

Clearly both metrics should be used in combination with CPU, memory stats, min/max boundaries and proper hysteresis logic. Also we have to consider the cost of worker creation.

wolfy-j commented 5 years ago

I believe we can also calculate 2nd derivative to build better prediction logic, but this is not type of rabbit hole I would like to jump into... yet.

stefanos82 commented 5 years ago

Very informative. Now I have a clearer view about the whole thing, thank you.

stefanos82 commented 5 years ago

This article could be used as a source of inspiration: Building a Worker Pool in Golang

It does not mean it demonstrates 100% what I have suggested, but the concept around dynamism is demonstrated in it.

Please bear in mind that there is a high possibility that I'm wrong about the article's content and that I most probably have misunderstood its concept.

Nevertheless, it's a very informative article that makes you appreciate the use of channels.

wolfy-j commented 4 years ago

https://books.google.by/books?id=OoX0BwAAQBAJ&pg=PA146&lpg=PA146&dq=realtime+balancing+algos&source=bl&ots=OSCB1TbdJt&sig=ACfU3U0T39kPXUUCvb-usC1nco7nwWSDZw&hl=en&sa=X&ved=2ahUKEwi07MX2jeDoAhVOUZoKHdiXBHkQ6AEwAnoECAsQLw#v=onepage&q&f=false

stefanos82 commented 4 years ago

@wolfy-j It's not visible for me I'm afraid.

Can you take a screenshot and paste it here please?

rustatian commented 4 years ago

@stefanos82 link Book: Principles of Distributed Systems Chapter: A Lower-Bound algorithm for Load Balancing in Real-time

rustatian commented 3 years ago

Idea:

Measure a time in the ServeHTTP function. That will be a time when a request arrived at us.
Measure an exit time, when the worker released.

The sub of those timers will give us a piece of information about how long the request waited for the actual execution (we may also sub an execution time, where the worker was in the process of request execution). In the configuration, the user will be able to set threshold value as well with the max workers number, cooldown timeout, and step. For example:

Request arrived -> start the timer.
WorkerWatcher released a worker for the request.
Actual execution in the Pool <-- this time we should subtract, because we need a RR processing time.
Worker returned to the WorkerWatcher -> stop the timer.

Results (example): 3 seconds waiting for the worker, 500ms actual work in the PHP worker. 3-0.5=2.5s (data science here), threshold - 1s --> decision: allocate a worker (or 2-3-5 according to the step but no more than max). Also need to handle negative time in a case when waiting for the worker, after scale, would be smaller, than the actual execution.

Next request: 2 seconds waiting for the worker (time reduced for example), 500ms execution --> decision: allocate a worker. Next request: 1 second. Smaller than specified in the configuration, skip. Next request: 500ms. Smaller than specified in the configuration, cooldown timeout expired --> decision: deallocate a worker (step).

Kaspiman commented 1 year ago

Hello! How is work progressing in this feature? I see that the label "v2023.1.0" has been removed. Is the feature still planned?

rustatian commented 1 year ago

Not planned ATM. Still not sure the RR should have this feature in the modern era of K8s and other orchestration tools that can scale pods on demand. From the RR side, it provides metrics to make the decision about scaling (like queue size) for the orchestration tools.

rustatian commented 4 hours ago

Reopening, in the next release (v2024.3) RR would have a DynamicPool configuration in addition to the static pool. Via this configuration, you may specify a max dynamic workers count, idle timeout (if no-workers won't be triggered during this timeout, all dynamically allocated workers would be gracefully de-allocated) + allocation step (spawn rate).

How it works: Currently, pool.allocate_timeout option is responsible for the worker waiting timeout. After this timeout, the request would be dropped with the error - NoFreeWorkers. With the dynamic pool, RR instead allocates additional workers, some of them go to the pool (up to max_workers) and one handles the request. Any feedback on how this feature should work is highly appreciated.

Kaspiman commented 49 minutes ago

Wow, what a gift!

roadrunner-server / roadrunner

[💡FEATURE REQUEST]: Adaptive scaling of the workers #97