Open stefanos82 opened 5 years ago
@stefanos82 hi! The idea of RR is that we keep in memory as much code as possible and don't have to bootstrap system on each request. PHP-FPM working differently. They spawning workers but didn't do anything until you make the request and after it destroys worker. From my perspective, it doesn't make sense to create a dynamic amount of workers because it will kill mail purpose of the system.
This is possible and has been planned from the beginning (that's why worker pool has name StaticPool and interfaced for usage in Server).
The only trick in this feature is to properly define scaling logic to push/pull workers from allocation channel. We are very actively talking about this feature internally, cos if it's done wrong effect on application can be very harmful.
Hmmm, I see.
Well, since we care about performance, maybe we should have figured the number of workers based on CPU cores? That's also an option I would say.
package main
import (
"fmt"
"runtime"
)
func main() {
fmt.Println(runtime.NumCPU())
}
I can only set it as default value for the pool.numWorkers option.
OK, what if I would like to increase or decrease the number of CPU workers on the fly. Is there any hotkey for this option?
Currently this API is not exposed, but it is possible to configure pool with different configuration: https://github.com/spiral/roadrunner/blob/master/server.go#L113
You mean to use Reconfigure...when exactly? I'm referring at runtime.
If for instance, I run rr
but I realize after a while that my traffic needs more workers.
Am I forced to stop it, increase the number for workers in .rr.yaml
, and then restart it?
Currently yes, Reconfigure is what used in http:reset which you call in runtime (without stopping the server). I guess I can add a flag to alter number of workers for this function.
That would be more than awesome.
After couple of intense internal discussions (thank you @ValeryPiashchynski, Andrew M, Alexei N, and @vvval ) we have come up with a plan of adding basic balancing mechanism based on 2 derivative metrics - allocation time and processing time. Thought, more metrics can be added in a future, this two should cover a lot of possible use cases such as a lot of fast queries, few amount of large queries and so on.
If anyone have anything to share regarding adaptive scaling mechanism algorithms we are glad to listen.
It would be a lot helpful if you could expand more on this, much like a case study, what led you to choose these two derivative metrics, and so forth.
I could investigate it and see whether there is a better alternative that could be applied.
Well, cost they both derivative :) Each metric depends on CPU load, number of connections, processing time and etc.
In theory, even one of this metrics should include enough information to scale system up and down.
I will try to explain couple of scenarios (green = processing time, orange = allocation time):
1) processing time is high but allocation time is low system accepting heavy requests on low/medium rate, no need to scale
2) processing time is high, allocation time is growing system is accepting more (number of) heavy requests than previously, it's good time to scale up
3) processing time is high, allocation time is high system is accepting heavy requests on high rate. System can only scale here if CPU/memory is available, otherwise the system is saturated.
-- please do not consider this whole chart as the timeline for the app, it's an example ---
4) processing time is low, allocation time is high system is accepting a lot of small requests on a high rate, we can scale up if CPU/memory is available.
5) low processing time, low allocation time you are running hello world application. :)
This is not final, we are still having the discussions and open to suggestions, this is our first shoot (before the implementation). I'm ready to accept that we have the fatal flaw in this logic, however, this metrics are easy to retrieve and process, so they looks promising for first version of adaptive scaling mechanism.
Clearly both metrics should be used in combination with CPU, memory stats, min/max boundaries and proper hysteresis logic. Also we have to consider the cost of worker creation.
I believe we can also calculate 2nd derivative to build better prediction logic, but this is not type of rabbit hole I would like to jump into... yet.
Very informative. Now I have a clearer view about the whole thing, thank you.
This article could be used as a source of inspiration: Building a Worker Pool in Golang
It does not mean it demonstrates 100% what I have suggested, but the concept around dynamism is demonstrated in it.
Please bear in mind that there is a high possibility that I'm wrong about the article's content and that I most probably have misunderstood its concept.
Nevertheless, it's a very informative article that makes you appreciate the use of channels.
@wolfy-j It's not visible for me I'm afraid.
Can you take a screenshot and paste it here please?
@stefanos82 link Book: Principles of Distributed Systems Chapter: A Lower-Bound algorithm for Load Balancing in Real-time
Idea:
ServeHTTP
function. That will be a time when a request arrived at us.The sub of those timers will give us a piece of information about how long the request waited for the actual execution (we may also sub an execution
time, where the worker was in the process of request execution).
In the configuration, the user will be able to set threshold value as well with the max
workers number, cooldown
timeout, and step
.
For example:
Results (example): 3 seconds waiting for the worker, 500ms actual work in the PHP worker. 3-0.5=2.5s (data science here), threshold - 1s --> decision: allocate a worker (or 2-3-5 according to the step but no more than max
). Also need to handle negative time in a case when waiting for the worker, after scale, would be smaller, than the actual execution.
Next request: 2 seconds waiting for the worker (time reduced for example), 500ms execution --> decision: allocate a worker.
Next request: 1 second. Smaller than specified in the configuration, skip.
Next request: 500ms. Smaller than specified in the configuration, cooldown timeout expired --> decision: deallocate a worker (step
).
Hello! How is work progressing in this feature? I see that the label "v2023.1.0" has been removed. Is the feature still planned?
Not planned ATM. Still not sure the RR should have this feature in the modern era of K8s and other orchestration tools that can scale pods on demand. From the RR side, it provides metrics to make the decision about scaling (like queue size) for the orchestration tools.
Reopening, in the next release (v2024.3) RR would have a DynamicPool
configuration in addition to the static pool. Via this configuration, you may specify a max dynamic workers count, idle timeout (if no-workers won't be triggered during this timeout, all dynamically allocated workers would be gracefully de-allocated) + allocation step (spawn rate).
How it works:
Currently, pool.allocate_timeout
option is responsible for the worker waiting timeout. After this timeout, the request would be dropped with the error - NoFreeWorkers. With the dynamic pool, RR instead allocates additional workers, some of them go to the pool (up to max_workers
) and one handles the request. Any feedback on how this feature should work is highly appreciated.
Wow, what a gift!
With PHP-FPM we have three options:
static
,dynamic
, andondemand
.Can we accomplish such thing with rr? I don't think it makes sense to waste resources when a website is more or less in idle mode; It should be able to limits its workers to the lowest level possible for obvious reasons.
Thoughts and / or suggestions?