markparticle / WebServer

C++ Linux WebServer服务器
Apache License 2.0
3.49k stars 730 forks source link

单元测试异步Log时死锁 #68

Open yanksx233 opened 2 years ago

yanksx233 commented 2 years ago

主线程调用最后一次LOG_BASE后循环程序运行完毕,开始调用~Log():

if(writeThread_ && writeThread_->joinable()) {
        while(!deque_->empty()) {
            deque_->flush();
        };
        deque_->Close();
        writeThread_->join();    //    <----------------------------------
    }

若此时子线程正在写fp,则会出现死锁。

void Log::AsyncWrite_() {
    string str = "";
    while(deque_->pop(str)) {
        lock_guard<mutex> locker(mtx_);   //  <----------------------------------正在写fp
        fputs(str.c_str(), fp_);       //  <----------------------------------正在写fp
    }
}

分析如下: 在~Log()中,首先主线程判断deque是否为空然后deque->flush()唤醒deque_中的consumer,直到deque为空。 假设deque.size() == 1,子线程执行deque->pop(str)后deque.size()==0,然后执行fputs(str.cstr(), fp)。 主线程则析构Log执行deque->Close(), 并在writeThread->join()处等待。然后子线程fputs执行完毕,重新判断循环条件deque->pop(str),进入deque->pop后,由于deque.size()==0,于是停在 deque->pop 中的 condConsumer_.wait(locker)处。而此时主线程正在 join 等待子线程结束,从而出现死锁。

测试如下: 多次运行test,直到出现死锁

ubt@TvT:~/WebServer/test$ ./test

另起终端:

ubt@TvT:~/MyWebServer/unit_test$ ps -aux | grep test
ubt      12836  0.4  0.0  91048  5604 pts/5    Sl+  20:37   0:04 ./test

使用gdb查看函数调用栈:

ubt@TvT:~/MyWebServer/unit_test$ gdb attach 12836 -q
attach: No such file or directory.
Attaching to process 12836
[New LWP 12837]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__pthread_clockjoin_ex (threadid=140035354838784, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
145     pthread_join_common.c: No such file or directory.
(gdb) thread apply all bt

Thread 2 (Thread 0x7f5c8594c700 (LWP 12837)):
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5606be2d33f4) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5606be2d3398, cond=0x5606be2d33c8) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x5606be2d33c8, mutex=0x5606be2d3398) at pthread_cond_wait.c:647
#3  0x00007f5c8612ee30 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00005606bd583b2f in BlockDeque<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::pop (item="2022-09-12 20:37:44.005142 [error]: Test 222222222 99999 ", '=' <repeats 13 times>, " \n", this=0x5606be2d3340) at /usr/include/c++/9/bits/std_mutex.h:103
#5  Log::AsyncWrite_ (this=0x5606bd5b3480 <Log::Instance()::inst>) at ../code/log/log.cpp:180
#6  0x00007f5c86134de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f5c86028609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8  0x00007f5c85f4d133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f5c8594d740 (LWP 12836)):
#0  __pthread_clockjoin_ex (threadid=140035354838784, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
#1  0x00007f5c86135047 in std::thread::join() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00005606bd584135 in Log::~Log (this=0x5606bd5b3480 <Log::Instance()::inst>, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:360
#3  0x00007f5c85e748a7 in __run_exit_handlers (status=0, listp=0x7f5c8601a718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#4  0x00007f5c85e74a60 in __GI_exit (status=<optimized out>) at exit.c:139
#5  0x00007f5c85e5208a in __libc_start_main (main=0x5606bd583780 <main()>, argc=1, argv=0x7fff0f1bd7a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff0f1bd798) at ../csu/libc-start.c:342
#6  0x00005606bd58380e in _start () at ../code/log/log.cpp:193

可以看到 Thread 1 正在join等待子线程结束,而 Thread 2 锁在pop中的condition_variable::wait处。

原因可能是Log中访问deque_时没有任何保护措施(同步操作)。由于本人技术有限,初步解决方案是pop的while中先判断 isclose 然后再cond_wait,可以解决此问题。

kktaozi commented 9 months ago

很帅,确实应该判断关闭信号,他其他用到关闭信号的函数也有这个问题

kerolt commented 6 months ago

Hello,我也碰到了这个问题,我的解决办法是这样的:

block_queue.h

template <class T>
bool BlockQueue<T>::Pop(T& data) {
    std::unique_lock<std::mutex> lock(mutex_);
    consume_cond_.wait(lock, [this] { return !queue_.empty() || is_close_; }); // 当队列中无数据时“消费者”阻塞
    if (is_close_) {
        return false;
    }
    data = queue_.front();
    // std::cout << data << "\n";
    queue_.pop_front();
    produce_cond_.notify_one(); // 通知“生产者”
    return true;
}

~Log()内不执行Log::Flush(),直接使用fflush的缓冲区

if (file_) {
    std::lock_guard<std::mutex> lock(mutex_);
    // Flush();
    fflush(file_);
    fclose(file_);
}

简单测试后好像并没有问题:)