Configurable strategy for picking async-style http worker

Meldiron commented 2 months ago

Please answer these questions before submitting your issue.

What did you do? If possible, provide a simple script for reproducing the error.

I configured an async-style HTTP server to have two workers:

$http
    ->set([
        'worker_num' => 2
    ]);

I made an endpoint with another request as a dependency. Flow is as follows:

Recieve request at /version
Inside this request, send CURL request to 127.0.0.1/health. Timeout for that is 60 seconds
In /health, simply return OK
Respond to original /version request

What did you expect to see?

TLDR: I expected delays, which resolves in few seconds.

All requests are being properly processed. If I receive 2 or more concurrent requests for a while, they will be delayed, but only for a few seconds at most. This would be expected because I configured the server to have 2 worker_num. Considering the above example, 1 request to /version (creating another to /health) is stable, but second concurrent request creates queue and starts delay.

What did you see instead?

TLDR: I got enormous delays, that will remain infinitely.

With 1 concurrent request to /version, everything is perfect. When doing 2 concurrent requests, even for a few seconds, everything freezes for 60 seconds - coming from internal CURL timeout. I explain my assumptions why this happens below in additional insights.

What version of Swoole are you using (show your php --ri swoole)?

Version => 5.1.2
Built => Feb 23 2024 10:25:58

What is your machine environment used (show your uname -a & php -v & gcc -v) ?

Linux c568e83c7872 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct  7 01:35:40 UTC 2023 x86_64 Linux

PHP 8.3.3 (cli) (built: Feb 16 2024 21:25:21) (NTS)

gcc version 13.2.1 20231014 (Alpine 13.2.1_git20231014)

✨ Additional insights

I believe Swoole gives new requests for processing to workers which are idle. This is expected.

I also believe that when there are 0 idle workers, Swoole gives it to any random worker, and then worker keeps it in queue. When such worker finishes previous jobs, he takes request from his queue to process it. Is that the case? If so, this is the cause of the above-explained problem, because it can assign requests randomly to the same worker - and if 2 requests depend on each other, they cause infinite zombie worker.

A better solution would be to keep the "queue of requests for processing" on master process instead of per-worker. This way we don't force specific request on specific worker. I believe by doing it randomly, sometimes a request can get assigned to the same worker as is currently processing previous request that triggered it, causing infinite zombie worker (until CURL times out).

Is there a configuration that could prevent this problem?

🧠 Step-by-step scenario

Start HTTP server with 2 worker_num, let's name them A and B
Send request to /version, arriving to worker A
From inside /version, send request to /health arriving to worker B

After step 3, there is no idle worker

Send request to /version, going to worker B

Request in step 4 didn't have any idle workers, so it just picked randomly between A and B. It picked B

Request from step 3 ok worker B finishes
Worker B starts to work on request from step 4

After step 6, there is no idle worker

From inside /version on worker B, send request to /health arriving to worker B

Request in step 7 didn't have any idle workers, so it just picked randomly between A and B. It picked B

Worker B is frozen forever (until CURL request for /health times out, 60 seconds in my case)

This is state of workers in this situation:

Worker A: Processing /version request Worker B: Processing /version request, waiting for /ping CURL req. To finish. In worker B queue, there is /ping request but it wont start to be processed, before current job of /version finishes - which is never as this just created infinite dependency.

gewenyu99 commented 2 months ago

👀 Sooo this is why functions are timing out all the time?

NathanFreeman commented 2 months ago

<?php
require "./vendor/autoload.php";
use GuzzleHttp\Client;

$http = new Swoole\Http\Server('127.0.0.1', 9501);
$http->set([
    'worker_num' => 1,
    'enable_coroutine' => true,
    'hook_flags' => SWOOLE_HOOK_ALL
]);

$http->on('request', function ($request, $response) {
    if ($request->server['request_uri'] == '/test') {
        $response->end('Hello World');
        echo 456;
    } else {
        $client = new Client(['base_uri' => 'http://127.0.0.1:9501/']);
        $client->request('GET', 'test');
        $response->end('Hello World');
    }
});

$http->start();

You can use coroutines to solve the problem of dependent requests.

NathanFreeman commented 2 months ago

<?php
require "./vendor/autoload.php";
use GuzzleHttp\Client;

$http = new Swoole\Http\Server('127.0.0.1', 9501, SWOOLE_PROCESS);
$http->set([
    'worker_num' => 4,
    'enable_coroutine' => false,
    'dispatch_func' => function ($server, $fd, $type, $data = null) {
        // The last process only handles the /health request.
        if ($data && str_starts_with($data, 'GET /health HTTP/1.1')) {
            return 3;
        }

        return rand(0, 2);
    }
]);

$http->on('request', function ($request, $response) {
    if ($request->server['request_uri'] == '/health') {
        $response->end('Hello World');
    } else {
        $client = new Client(['base_uri' => 'http://127.0.0.1:9501/']);
        $client->request('GET', 'health');
        $response->end('Hello World');
    }
});

$http->start();

In SWOOLE_PROCESS mode, by configuring the dispatch_func, the last process only handles the /health request and does not handle the /version request. The remaining processes only handle the /version request and do not handle the /health request.

Meldiron commented 2 months ago

@NathanFreeman Thanks for your insights 🙌

Sadly we cannot easily rewrite our server to coroutine-style due to the use of some stateful variables.

I will take a look at dispatch_func and dispatch_mode, that could be great solution ✨

NathanFreeman commented 2 months ago

<?php
$http = new Swoole\Http\Server('127.0.0.1', 9501);
$http->set([
    'worker_num' => 4,
    'task_worker_num' => 4,
    'enable_coroutine' => false
]);

$http->on('request', function ($request, $response) use ($http) {
    $http->task('health');
    $response->end('Hello World');
});

$http->on('task', function ($serv, $task_id, $reactor_id, $data) {
    echo "receive {$data}";
});

$http->start();

Or you can use the task event to handle the business logic related to /health request in the task process. @Meldiron

Meldiron commented 2 months ago

@NathanFreeman Thanks for all the inisghts 🙌

I first tried all the dispatch modes blindly, without knowing how they behave. By running the same benchmarks, I only got worse results.

Next, I tried dispatch_func which separates calls to /versions and /health, which solved the issue fully but made the server overall slower. I believe it became slower since our usual use case doesn't split those dependent and in-dependent requests in half, so some workers were idling for long, while others were working nonstop.

I had an idea of some "auto-scaling" solution that would analyze incoming requests, and instead of doing it half-and-half, it would make an informed decision to adjust the split. Sadly, I could not find easy metric to track so I decided to avoid this solution as it felt like overkill.

Finally, I decided to continue with dispatch_func, but I need to make it smarter. Sadly, from incoming request information, I cant decide if a request will cause dependency or not. If I knew that, I could keep track of the state of each worker and their job to avoid workers with possibly dependent requests. Thankfully, I know the request is about to become dependent before sending the internal second request, which is a key to my solution here. I believe keeping the state on the master process is the right path, but the worker needs to have the ability to report to the worker saying Hey, I will now be a risky worker, please don't send requests to me. Later when the worker finishes that request, he will report I am available for all requests again. This will likely slow down the server but only by a small margin. If I only do this logic when there are 0 idle workers, I keep the effect very, very low, while fully avoiding freezing issues.

Considering the above solution, is there some channel for sending a message from the worker process to the master process in Swoole?

Meldiron commented 2 months ago

Ill also take a look at Swoole tasks and see if they can help in my use-case. Ill keep you posted on this topic as well 🙏

swoole / swoole-src

Configurable strategy for picking async-style http worker #5274

✨ Additional insights

🧠 Step-by-step scenario