swoole / swoole-src

🚀 Coroutine-based concurrency library for PHP
https://www.swoole.com
Apache License 2.0
18.42k stars 3.16k forks source link

rwlock deadlock, Rwlock::unlock cross thread unlock not work #5024

Open AliceSync opened 1 year ago

AliceSync commented 1 year ago

Tps:

这个问题目前只存在于pecl安装的swoole(包括手动编译),在swoole-cli中没有此问题,且锁类型只有 SWOOLE_RWLOCK 才会存在这个问题. 目前测试过的php8.0-8.2 swoole4.8-5.0都成功复现。 为了确定非扩展兼容问题,其它扩展禁用情况下依然复现。

为了确保描述可靠,这里增加一些描述 注意!这是一个bug!并非功能性增强,pecl下安装的swoole存在跨进程解锁返回true,但实际上解锁失败卡住的问题,这个问题只有php-cli模式下的读写锁才会存在,测试swoole-cli与其它锁类型都不会触发这个bug!

  1. What did you do? If possible, provide a simple script for reproducing the error.

    install

    ./configure --prefix=/test/php --disable-all --enable-pcntl --enable-swoole
    make
    make install
    $lock = new \Swoole\Lock(SWOOLE_RWLOCK);
    for ($i = 0; $i < 5; $i++) {
    $pid = pcntl_fork();
    if ($pid == -1) {
    var_dump("error");
    } elseif ($pid == 0) {
    if ($i === 3) {
    sleep(3);
    var_dump("$i try unlock", $lock->unlock());
    }
    var_dump("$i try lock");
    if ($lock->lock()) {
    sleep(1); # 避免打印堆积无法正常显示
    var_dump("$i lock success");
    }
    if ($i === 0) {
    var_dump("$i not unlock, out");
    exit;
    }
    var_dump("$i lock close", $lock->unlock());
    exit;
    }
    }
    while (pcntl_wait($status) != -1) {
    }
    var_dump("End");
  2. What did you expect to see?

success unlock in swoole-cli

[root@centos8 test]# swoole test.php 
string(10) "0 try lock"
string(10) "1 try lock"
string(10) "2 try lock"
string(10) "4 try lock"
string(14) "0 lock success"
string(17) "0 not unlock, out"
string(12) "3 try unlock"
bool(true)
string(10) "3 try lock"
string(14) "1 lock success"
string(12) "1 lock close"
bool(true)
string(14) "2 lock success"
string(12) "2 lock close"
bool(true)
string(14) "3 lock success"
string(12) "3 lock close"
bool(true)
string(14) "4 lock success"
string(12) "4 lock close"
bool(true)
string(3) "End"
  1. What did you see instead?

display success unlock but not unlock in php-cli

[root@centos8 test]#  php test.php 
string(10) "0 try lock"
string(10) "1 try lock"
string(10) "2 try lock"
string(10) "4 try lock"
string(14) "0 lock success"
string(17) "0 not unlock, out"
string(12) "3 try unlock"
bool(true)
string(10) "3 try lock"
^C  # loading..... so ctrl c
[root@centos8 test]# 
  1. What version of Swoole are you using (show your php --ri swoole)?
swoole

Swoole => enabled
Author => Swoole Team <team@swoole.com>
Version => 5.0.2
Built => Apr  9 2023 23:45:05
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
cpu_affinity => enabled
spinlock => enabled
rwlock => enabled
http2 => enabled
json => enabled
pcre => enabled
zlib => 1.2.11
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
async_redis => enabled

Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_preemptive_scheduler => Off => Off
swoole.display_errors => On => On
swoole.use_shortname => On => On
swoole.unixsock_buffer_size => 8388608 => 8388608
  1. What is your machine environment used (show your uname -a & php -v & gcc -v) ?

centos8

NathanFreeman commented 1 year ago

因为Swoole lock设计上是用来实现进程同步的,底层采用的是共享内存,所以只要一个子进程没解锁,剩下的子进程都得阻塞。

AliceSync commented 1 year ago

因为Swoole lock设计上是用来实现进程同步的,底层采用的是共享内存,所以只要一个子进程没解锁,剩下的子进程都得阻塞。

你应该查看一下代码再说,我是说读写锁在跨进程解锁存在bug,因为swoole-cli的单文件是不存在这个问题的,这个bug只存在于读写锁且属于普通php模式下pecl库安装的swoole才会触发

NathanFreeman commented 1 year ago

好的,我再测试看看😁。

AliceSync commented 1 year ago

好的,我再测试看看😁。

我顺便补充了一下在我本机测试的输出结果,就那串代码,你用swoole-cli和php-cli分别运行一次就知道了,还有很奇怪这个bug只有读写锁会,互斥之类的锁都不会,我也是在自动化覆盖测试卡住才发现的

NathanFreeman commented 1 year ago

好的,谢谢

matyhtf commented 1 week ago

怀疑是 glibc pthread 的 BUG,暂时无法解决。swoole-cli 使用的是 Musl libc ,不存在此问题。需要向 glibc 官方求助了

AliceSync commented 1 week ago

怀疑是 glibc pthread 的 BUG,暂时无法解决。swoole-cli 使用的是 Musl libc ,不存在此问题。需要向 glibc 官方求助了

韩大佬出现了,我还以为这个bug没人管了呢,我倒是用不上读写锁,只是发现了这个问题反馈了一下,这种奇怪的问题估计上游也不怎么可能快速修复,我c比较捉急,所以这事韩大佬看着来吧

matyhtf commented 6 days ago

已经代码精简为一个 C++ 文件,反馈给了 glibc 官方。

Glibc bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=32220

代码如下:

#include <iostream>
#include <pthread.h>
#include <thread>
#include <unistd.h>

struct RWLockImpl {
  pthread_rwlock_t _lock;
  pthread_rwlockattr_t attr;

  RWLockImpl() {
    pthread_rwlockattr_init(&attr);
    if (pthread_rwlock_init(&_lock, &attr) != 0) {
      throw std::system_error(errno, std::generic_category(),
                              "pthread_rwlock_init() failed");
    }
  }

  int lock() { return pthread_rwlock_wrlock(&_lock); }

  int unlock() { return pthread_rwlock_unlock(&_lock); }
};

struct MutexImpl {
    pthread_mutex_t lock_;
    pthread_mutexattr_t attr_;

  MutexImpl() {
    pthread_mutexattr_init(&attr_);
    if (pthread_mutex_init(&lock_, &attr_) != 0) {
      throw std::system_error(errno, std::generic_category(),
                              "pthread_mutex_init() failed");
    }
  }

  int lock() { return pthread_mutex_lock(&lock_); }

  int unlock() { return pthread_mutex_unlock(&lock_); }
};

int main(int argc, char **argv) {
   RWLockImpl impl;
  // MutexImpl impl;

  std::thread threads[5];

  for (int i = 0; i < 5; i++) {
    threads[i] = std::thread([&impl, i]() {
      if (i == 3) {
        sleep(3);
        std::cout << i << " try unlock, result=" << impl.unlock() << std::endl;
      }

      std::cout << i << " try lock " << std::endl;

      if (impl.lock() == 0) {
        sleep(1);
        std::cout << i << " lock success " << std::endl;
      }
      if (i == 0) {
        std::cout << i << " no unlock, exit directly " << std::endl;
        pthread_exit(0);
        return;
      }
      std::cout << i << "lock close, result=" << impl.unlock() << std::endl;

      pthread_exit(0);
      return;
    });
  }

  for (int i = 0; i < 5; i++) {
    threads[i].join();
  }
}

编译

g++ rwlock.cpp -pthread -std=c++11 -g -o rwlock

使用 RWLock 时

./rwlock 
0 try lock 
2 try lock 
1 try lock 
4 try lock 
0 lock success 
0 no unlock, exit directly 
3 try unlock, result=0
3 try lock 
^C
swoole@swoole-ThinkPad-T470p:~/workspace/debug$ 

改为 Mutex

swoole@swoole-ThinkPad-T470p:~/workspace/debug$ ./rwlock 
0 try lock 
1 try lock 
24 try lock  try lock 

0 lock success 
0 no unlock, exit directly 
3 try unlock, result=0
3 try lock 
3 lock success 
3lock close, result=0
4 lock success 
4lock close, result=0
2 lock success 
2lock close, result=0
1 lock success 
1lock close, result=0
swoole@swoole-ThinkPad-T470p:~/workspace/debug$ 

总结

在线程0 中 lock() 成功,但未解锁直接退出,另外一个线程 unlock() 操作返回了成功,但实际上是无效的。所有线程均进入了死锁状态。