swoole / swoole-src

🚀 Coroutine-based concurrency library for PHP
https://www.swoole.com
Apache License 2.0
18.25k stars 3.16k forks source link

Using package_length_func #5300

Open karevan opened 3 weeks ago

karevan commented 3 weeks ago
  1. What did you do? If possible, provide a simple script for reproducing the error. I want to use package_length_func function instead of:
    // 'package_length_type'      => 'l', //see php pack()
    // 'package_length_offset'    => 28,
    // 'package_body_offset'      => 32,
$server = new Server('127.0.0.1','9501', SWOOLE_PROCESS, SWOOLE_SOCK_TCP);
$server->set([
'open_length_check'        => true,
'package_max_length'       => 81920,
// 'package_length_type'      => 'l', //see php pack()
// 'package_length_offset'    => 28,
// 'package_body_offset'      => 32,
'package_length_func'      => function ($data) {
 if (strlen($data) < 32) {
            return 0;
        }
        try {
            $length = intval(unpack('l', substr($data, 28, 4))[1]);
            if ($length <= 0 or $length > 1024) {
                return -1;
            } elseif ($length > (strlen($data)) - 32) {
                return 0;
            } else {
                return $length + 32;
            }
        } catch (Throwable $throwable) {
            return -1;
        }
 }
]);
  1. What did you expect to see? I expected this code:
    'package_length_func'      => function ($data) {
    if (strlen($data) < 32) {
            return 0;
        }
        try {
            $length = intval(unpack('l', substr($data, 28, 4))[1]);
            if ($length <= 0 or $length > 1024) {
                return -1;
            } elseif ($length > (strlen($data)) - 32) {
                return 0;
            } else {
                return $length + 32;
            }
        } catch (Throwable $throwable) {
            return -1;
        }
    }

    It works like these parameters:

    // 'package_length_type'      => 'l', //see php pack()
    // 'package_length_offset'    => 28,
    // 'package_body_offset'      => 32,
  2. What did you see instead? This code does not work properly in a large number of connection requests and causes all workers to be restarted. WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:45162]
  3. What version of Swoole are you using (show your php --ri swoole)? Swoole => enabled Author => Swoole Team team@swoole.com Version => 5.0.3 Built => Aug 3 2023 12:19:58 coroutine => enabled with boost asm context epoll => enabled eventfd => enabled signalfd => enabled cpu_affinity => enabled spinlock => enabled rwlock => enabled http2 => enabled json => enabled pcre => enabled zlib => 1.2.11 mutex_timedlock => enabled pthread_barrier => enabled futex => enabled async_redis => enabled

Directive => Local Value => Master Value swoole.enable_coroutine => On => On swoole.enable_library => On => On swoole.enable_preemptive_scheduler => Off => Off swoole.display_errors => On => On swoole.use_shortname => On => On swoole.unixsock_buffer_size => 8388608 => 8388608

  1. What is your machine environment used (show your uname -a & php -v & gcc -v) ? [1] 19287 [2] 19288 Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:hsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
NathanFreeman commented 3 weeks ago

It is possible that there is an issue with the data being sent, indicating an error in receiving a packet of length 50, and the custom function is unable to handle it, resulting in a return of -1 and closing the connection. Check if an exception is being thrown.

karevan commented 3 weeks ago

I checked this and the exception does not occur, when there are many packages incoming that are not valid and we return -1, after some time it causes the workers to restart.

Like this:

[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39558]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 254 bytes of malformed data from the client[172.19.0.2:39560]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39562]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39564]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39566]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39568]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 254 bytes of malformed data from the client[172.19.0.2:39570]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 254 bytes of malformed data from the client[172.19.0.2:39572]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39574]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 254 bytes of malformed data from the client[172.19.0.2:39576]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39578]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39580]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 254 bytes of malformed data from the client[172.19.0.2:39582]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 254 bytes of malformed data from the client[172.19.0.2:39584]
[2024-04-23 11:58:10 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39586]
[2024-04-23 11:58:11 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39592]
[2024-04-23 11:58:11 #18985.3]  WARNING Protocol::recv_with_length_protocol() (ERRNO 1204): received 50 bytes of malformed data from the client[172.19.0.2:39594]

WORKER: 2 IS EXIT
WORKER: 0 IS EXIT
WORKER: 4 IS EXIT
WORKER: 2 IS EXIT
WORKER: 0 IS EXIT
WORKER: 4 IS EXIT
WORKER: 6 IS EXIT
WORKER: 6 IS EXIT
WORKER: 8 IS EXIT
WORKER: 1 IS EXIT
WORKER: 8 IS EXIT
WORKER: 1 IS EXIT
WORKER: 7 IS EXIT
WORKER: 7 IS EXIT
WORKER: 5 IS EXIT
WORKER: 5 IS EXIT
WORKER: 3 IS EXIT
WORKER: 3 IS EXIT
WORKER: 9 IS EXIT
WORKER: 9 IS EXIT
WORKER: 0 IS EXIT
WORKER: 1 IS EXIT
WORKER: 2 IS EXIT

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Note: I run this server with 10 workers

NathanFreeman commented 3 weeks ago

Do you have the configuration for "max_request" set?

karevan commented 3 weeks ago

No, I just used these configs:

$server->set([
 // Server
 'worker_num'               => 10,
 // Tcp server
 'heartbeat_idle_time'      => 510,
 'heartbeat_check_interval' => 10,
 // TCP Parser
 'open_length_check'        => true,
 'package_max_length'       => 81920,
 'package_length_func'      => function ($data) {
    //...
 },
 // Coroutine
 'enable_coroutine'         => true,
 'hook_flags'               => SWOOLE_HOOK_ALL,
 'max_coroutine'            => 200000,
]);
karevan commented 2 weeks ago

Can anyone help me with this issues?

karevan commented 1 week ago

I checked my codes and tried to see when exactly this error occurs Finally, I came to the conclusion that when I use Timer::tick outside of Workers, I get this error I had to use "APCu" instead of "package_length_func" 🥲

NathanFreeman commented 1 week ago

Could you please provide the code on how to use Swoole\Timer?

karevan commented 1 week ago

Explanation of the PHP code:

HamidServer class: This class is responsible for creating and running a Swoole server.

class HamidServer
{
    public $server;

    public function __construct(string $host, int $port)
    {
        $this->server = new Server($host, $port, SWOOLE_PROCESS, SWOOLE_SOCK_TCP);
    }

    public function start(): void
    {
        $this->server->start();
    }
}

SystemStatus class: This class is responsible for collecting system statistics and displaying them through an HTTP server.

$prometheusServer = $server->addlistener('127.0.0.1', 8889, SWOOLE_SOCK_TCP);
$prometheusServer->set([
    'open_http_protocol' => true, // Enable HTTP protocol parsing
]);
$prometheusServer->on('request', [$this, 'on_request']);

Creating SystemStatus class in HamidServer constructor:

class HamidServer
{
    // ...
    public $systemStatus;

    public function __construct(string $host, int $port)
    {
        // ...
        $this->systemStatus = new SystemStatus($this);
    }
    // ...
}

SystemStatus class constructor: In this section, Timer::tick is called and the necessary information for display in the output is stored.

class SystemStatus
{
    private $hamidContext;

    public function __construct(HamidServer $hamidContext)
    {
        $this->hamidContext = $hamidContext;

        Timer::tick(5000, function() {
            $serverState = $this->hamidContext->server->stats();
            // ...
        });
    }

    // ...
}

realMetric method: This method is called for each worker and performs similar tasks as the class constructor.

class SystemStatus
{
    // ...

    public function realMetric(int $workerId): void
    {
        Timer::tick(5000, function use ($workerId) {
            // ...
        });
    }

    // ...
}

Creating realMetric in on_worker_start:

class HamidServer
{
    // ...

    public function __construct(string $host, int $port)
    {
        // ...
        $this->server->on('WorkerStart', [$this, 'on_worker_start']);
    }

    public function on_worker_start(Server $server, int $workerId): void
    {
        if (!$server->taskworker) {
            $this->workerStatus->realMetric($workerId);
        }
    }

    // ...
}
karevan commented 1 week ago

I came and used APCu function instead of package_length_func Now I ran into a problem and that was that the workers are restarted automatically When I go and see the error log, I get something like this malloc(): unaligned tcache chunk detected I guess it is because of using APCu