Closed Meldiron closed 1 month ago
👀 Sooo this is why functions are timing out all the time?
<?php
require "./vendor/autoload.php";
use GuzzleHttp\Client;
$http = new Swoole\Http\Server('127.0.0.1', 9501);
$http->set([
'worker_num' => 1,
'enable_coroutine' => true,
'hook_flags' => SWOOLE_HOOK_ALL
]);
$http->on('request', function ($request, $response) {
if ($request->server['request_uri'] == '/test') {
$response->end('Hello World');
echo 456;
} else {
$client = new Client(['base_uri' => 'http://127.0.0.1:9501/']);
$client->request('GET', 'test');
$response->end('Hello World');
}
});
$http->start();
You can use coroutines to solve the problem of dependent requests.
<?php
require "./vendor/autoload.php";
use GuzzleHttp\Client;
$http = new Swoole\Http\Server('127.0.0.1', 9501, SWOOLE_PROCESS);
$http->set([
'worker_num' => 4,
'enable_coroutine' => false,
'dispatch_func' => function ($server, $fd, $type, $data = null) {
// The last process only handles the /health request.
if ($data && str_starts_with($data, 'GET /health HTTP/1.1')) {
return 3;
}
return rand(0, 2);
}
]);
$http->on('request', function ($request, $response) {
if ($request->server['request_uri'] == '/health') {
$response->end('Hello World');
} else {
$client = new Client(['base_uri' => 'http://127.0.0.1:9501/']);
$client->request('GET', 'health');
$response->end('Hello World');
}
});
$http->start();
In SWOOLE_PROCESS mode, by configuring the dispatch_func, the last process only handles the /health request and does not handle the /version request. The remaining processes only handle the /version request and do not handle the /health request.
@NathanFreeman Thanks for your insights 🙌
Sadly we cannot easily rewrite our server to coroutine-style due to the use of some stateful variables.
I will take a look at dispatch_func
and dispatch_mode
, that could be great solution ✨
<?php
$http = new Swoole\Http\Server('127.0.0.1', 9501);
$http->set([
'worker_num' => 4,
'task_worker_num' => 4,
'enable_coroutine' => false
]);
$http->on('request', function ($request, $response) use ($http) {
$http->task('health');
$response->end('Hello World');
});
$http->on('task', function ($serv, $task_id, $reactor_id, $data) {
echo "receive {$data}";
});
$http->start();
Or you can use the task event to handle the business logic related to /health
request in the task process. @Meldiron
@NathanFreeman Thanks for all the inisghts 🙌
I first tried all the dispatch modes blindly, without knowing how they behave. By running the same benchmarks, I only got worse results.
Next, I tried dispatch_func
which separates calls to /versions
and /health
, which solved the issue fully but made the server overall slower. I believe it became slower since our usual use case doesn't split those dependent and in-dependent requests in half, so some workers were idling for long, while others were working nonstop.
I had an idea of some "auto-scaling" solution that would analyze incoming requests, and instead of doing it half-and-half, it would make an informed decision to adjust the split. Sadly, I could not find easy metric to track so I decided to avoid this solution as it felt like overkill.
Finally, I decided to continue with dispatch_func
, but I need to make it smarter. Sadly, from incoming request information, I cant decide if a request will cause dependency or not. If I knew that, I could keep track of the state of each worker and their job to avoid workers with possibly dependent requests. Thankfully, I know the request is about to become dependent before sending the internal second request, which is a key to my solution here. I believe keeping the state on the master process is the right path, but the worker needs to have the ability to report to the worker saying Hey, I will now be a risky worker, please don't send requests to me
. Later when the worker finishes that request, he will report I am available for all requests again
. This will likely slow down the server but only by a small margin. If I only do this logic when there are 0 idle workers, I keep the effect very, very low, while fully avoiding freezing issues.
Considering the above solution, is there some channel for sending a message from the worker process to the master process in Swoole?
Ill also take a look at Swoole tasks and see if they can help in my use-case. Ill keep you posted on this topic as well 🙏
Please answer these questions before submitting your issue.
I configured an async-style HTTP server to have two workers:
I made an endpoint with another request as a dependency. Flow is as follows:
/version
127.0.0.1/health
. Timeout for that is 60 seconds/health
, simply return OK/version
requestTLDR: I expected delays, which resolves in few seconds.
All requests are being properly processed. If I receive 2 or more concurrent requests for a while, they will be delayed, but only for a few seconds at most. This would be expected because I configured the server to have
2
worker_num. Considering the above example, 1 request to /version (creating another to /health) is stable, but second concurrent request creates queue and starts delay.TLDR: I got enormous delays, that will remain infinitely.
With 1 concurrent request to /version, everything is perfect. When doing 2 concurrent requests, even for a few seconds, everything freezes for 60 seconds - coming from internal CURL timeout. I explain my assumptions why this happens below in additional insights.
php --ri swoole
)?uname -a
&php -v
&gcc -v
) ?✨ Additional insights
I believe Swoole gives new requests for processing to workers which are idle. This is expected.
I also believe that when there are 0 idle workers, Swoole gives it to any random worker, and then worker keeps it in queue. When such worker finishes previous jobs, he takes request from his queue to process it. Is that the case? If so, this is the cause of the above-explained problem, because it can assign requests randomly to the same worker - and if 2 requests depend on each other, they cause infinite zombie worker.
A better solution would be to keep the "queue of requests for processing" on master process instead of per-worker. This way we don't force specific request on specific worker. I believe by doing it randomly, sometimes a request can get assigned to the same worker as is currently processing previous request that triggered it, causing infinite zombie worker (until CURL times out).
Is there a configuration that could prevent this problem?
🧠 Step-by-step scenario
/version
, arriving to worker A/version
, send request to/health
arriving to worker B/version
, going to worker BRequest from step 3 ok worker B finishes
Worker B starts to work on request from step 4
/version
on worker B, send request to/health
arriving to worker B/health
times out, 60 seconds in my case)This is state of workers in this situation:
Worker A: Processing
/version
request Worker B: Processing/version
request, waiting for/ping
CURL req. To finish. In worker B queue, there is/ping
request but it wont start to be processed, before current job of/version
finishes - which is never as this just created infinite dependency.