swoole / swoole-src

🚀 Coroutine-based concurrency library for PHP
https://www.swoole.com
Apache License 2.0
18.25k stars 3.16k forks source link

postgresql在服务端主动断开连接时swoole无法捕获异常 #5304

Closed lincepro closed 3 weeks ago

lincepro commented 3 weeks ago

Please answer these questions before submitting your issue.

  1. What did you do? If possible, provide a simple script for reproducing the error. 大概就是线上业务非常偶尔会发生数据库执行失败但是用户收不到提示的情况 最后追到是连接长时间无活动,postgres服务端主动断开了连接,业务再次使用这个连接就会触发bug 要重现此bug需要在启动如下代码后,sleep过程中重启postgres服务,代码如下:
    
    <?php

use Swoole\Coroutine; use Swoole\Coroutine\System;

Coroutine\run(function () { Coroutine::create(function () { $obj = new Coroutine\PostgreSQL; $conn = $obj->connect("host=postgres port=5432 dbname=postges user=postgres password=postgres", 3); $stmt = $obj->query('SELECT 1');

    // 这里重启postgres服务, 模拟故障发生
    System:sleep(30);

    try {
        $stmt = $obj->query('SELECT 1');
        echo 'others1' . PHP_EOL; // 这一行会执行
    } catch (\Throwable $e) {
        // 无法捕获错误, 控制台会打印如下输出
        // WARNING ReactorImpl::after_removal_failure(): failed to delete events[fd=5#0, type=25, events=512], Error: Bad file descriptor[9]
        // Warning: Swoole\Coroutine\PostgreSQL::query(): swoole_event_del failed in /var/www/test.php on line 16
        var_dump($e);
    }

    echo 'others' . PHP_EOL; // 这里也会执行

    // 执行到这里协程无法正常退出, 命令行会一直阻塞
});

});



3. What did you expect to see?
能捕获异常

4. What did you see instead?
无法捕获异常,协程无法退出

5. What version of Swoole are you using (show your `php --ri swoole`)?
swoole

Swoole => enabled
Author => Swoole Team <team@swoole.com>
Version => 5.1.2
Built => Apr 24 2024 10:35:38
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
spinlock => enabled
rwlock => enabled
sockets => enabled
openssl => OpenSSL 3.1.4 24 Oct 2023
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
pcre => enabled
zlib => 1.2.13
brotli => E16777225/D16777225
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
mysqlnd => enabled
async_redis => enabled
coroutine_pgsql => enabled

Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_fiber_mock => Off => Off
swoole.enable_preemptive_scheduler => Off => Off
swoole.display_errors => On => On
swoole.use_shortname => On => On
swoole.unixsock_buffer_size => 8388608 => 8388608

6. What is your machine environment used (show your `uname -a` & `php -v` & `gcc -v`) ?
docker  5.1.2-php8.3-alpine

uname a
Linux a1e1753f8b0b 3.10.0-1160.90.1.el7.x86_64 #1 SMP Thu May 4 15:21:22 UTC 2023 x86_64 Linux

php -v
PHP 8.3.2 (cli) (built: Jan 27 2024 04:38:08) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.3.2, Copyright (c) Zend Technologies

gcc -v 
无
jingjingxyk commented 3 weeks ago

添加断线重连代码即可

参考 MySQL server has gone away ,添加断线重连代码

解决办法:

Try to reconnect to the database. reference

  1. https://github.com/swoole/swoole-src/issues/4131#issuecomment-815353916
  2. https://github.com/swoole/swoole-src/issues/2041
  3. https://github.com/swoole/swoole-wiki/blob/master/doc/15.11%20-%20MySQL%E7%9A%84%E8%BF%9E%E6%8E%A5%E6%B1%A0%E3%80%81%E5%BC%82%E6%AD%A5%E3%80%81%E6%96%AD%E7%BA%BF%E9%87%8D%E8%BF%9E.md
  4. https://github.com/swoole/swoole-src/issues/4141
  5. https://github.com/swoole/swoole-src/issues/5143

最佳的方案是进行断线重连

lincepro commented 3 weeks ago

添加断线重连代码即可

参考 MySQL server has gone away ,添加断线重连代码

解决办法:

Try to reconnect to the database. reference

  1. MySQL server has gone away - how to deal with it #4131 (comment)
  2. Swoole 4.2.3 Task中使用协程MySQL报错 #2041
  3. https://github.com/swoole/swoole-wiki/blob/master/doc/15.11%20-%20MySQL%E7%9A%84%E8%BF%9E%E6%8E%A5%E6%B1%A0%E3%80%81%E5%BC%82%E6%AD%A5%E3%80%81%E6%96%AD%E7%BA%BF%E9%87%8D%E8%BF%9E.md
  4. PDOPool does not reconnect on connection timeout. #4141
  5. php8.2版本 swoole 5.0.3 使用mysqlipool 会报错 mysql server has gone away in @swoole/library/core/Database/MysqliProxy.php(51) #5143

最佳的方案是进行断线重连

无法捕获异常,如何知道断线了呢? pdo断线了会抛出 PDOEXCEPTION

jingjingxyk commented 3 weeks ago

执行sql 之前,需要判断 $conn 是否有效


      $reconnect_count=0
     A:
     $conn = $obj->connect("host=postgres port=5432 dbname=postges user=postgres password=postgres", 3);
    if (!$conn) {
        var_dump($pg->error);
       $reconnect_count++
       if($reconnect_count<4){
           goto A;
       }

        return;
    }
NathanFreeman commented 3 weeks ago

或者用Swoole\Database\PDOPool代替PostgreSQL协程客户端,这里已经封装了断线重连的逻辑了

lincepro commented 3 weeks ago

执行sql 之前,需要判断 $conn 是否有效

$conn当然是有效的, 第一次的sql都能正常执行,是等了30秒后的第二次sql执行出问题

lincepro commented 3 weeks ago

或者用Swoole\Database\PDOPool代替PostgreSQL协程客户端,这里已经封装了断线重连的逻辑了

Coroutine\PostgreSQL 之后是不再维护了么? 我们线上客户环境都没有安装pdo_pgsql扩展 另外这个真的不是断线重连的问题,只要抛出异常让用户知道故障了就行

NathanFreeman commented 3 weeks ago

或者你可以判断$stmt = $obj->query('SELECT 1');是不是返回false,是的话将$obj->error作为错误信息返回出去

lincepro commented 3 weeks ago

或者你可以判断$stmt = $obj->query('SELECT 1');是不是返回false,是的话将$obj->error作为错误信息返回出去

漏说了一句, 因为我有注册 set_error_handler, 这个注册的函数能拿到这个错误: Swoole\Coroutine\PostgreSQL::query(): swoole_event_del failed

但是try catch 却捕获不到, 导致断线之后想在局部捕获都捕获不了,一定会抛给全局的错误处理函数

NathanFreeman commented 3 weeks ago

Swoole\Coroutine\PostgreSQL::query(): swoole_event_del failed这个错误它不是异常来的,是swoole内核直接将错误信息写入终端

matyhtf commented 3 weeks ago

需要检查 query 方法的返回值是不是 false,并获取错误码。据此实现断线重连

建议使用 pdo_pgsql + Runtime Hook, Swoole\Coroutine\PostgreSQL 已废弃了