google / sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer
Other
11.46k stars 1.03k forks source link

TSan does not detect a deadlock #1258

Open vasild opened 4 years ago

vasild commented 4 years ago

The following program causes a deadlock of the type A waits for B waits for C waits for A which remains undetected to TSan (tested with Clang 10 and 11).

Some condition variables are used to make the test deterministic so that the deadlock occurs every time. If those synchronizations are removed (remove the definition of DETERMINISTIC) then the deadlock does not happen everytime, but when it happens it is detected. So it must be something related to the condition variables that causes TSan to not detect the deadlock (when DETERMINISTIC is defined).

#include <array>
#include <condition_variable>
#include <cstdarg>
#include <cstdio>
#include <mutex>
#include <thread>

// This program creates a deadlock like A => B => C => A and prints:
//
// thread0 locked mutex0
// thread1 locked mutex1
// thread2 locked mutex2
// thread0 trying to lock mutex1
// thread1 trying to lock mutex2
// thread2 trying to lock mutex0 <-- deadlocks, undetected by TSan
//
// clang++ -std=c++11 -Wall -g -O0 -pthread -fsanitize=thread deadlock.cc -o deadlock
// TSAN_OPTIONS=detect_deadlocks=1:second_deadlock_stack=1 ./deadlock

#define DETERMINISTIC

#ifdef DETERMINISTIC
class Event
{
public:
    void signal()
    {
        std::unique_lock<std::mutex> lk(m_mutex);
        m_occurred = true;
        m_cond.notify_all();
    }
    void wait()
    {
        std::unique_lock<std::mutex> lk(m_mutex);
        m_cond.wait(lk, [&]() { return m_occurred; });
    }

private:
    bool m_occurred{false};
    std::condition_variable m_cond;
    std::mutex m_mutex;
};
#endif /* DETERMINISTIC */

std::mutex printf_mutex;
void printf_sync(const char* format, ...)
{
    va_list ap;
    va_start(ap, format);

    {
        std::unique_lock<std::mutex> lk(printf_mutex);
        vprintf(format, ap);
    }

    va_end(ap);
}

int main(int argc, char** argv)
{
    constexpr size_t n_threads = 3;

    std::array<std::mutex, n_threads> mutexes;
    std::array<std::thread, n_threads> threads;
#ifdef DETERMINISTIC
    std::array<Event, n_threads> locked_own;
    std::array<Event, n_threads> try_to_lock_next;
#endif /* DETERMINISTIC */

    auto thread = [&](size_t i) {
        std::unique_lock<std::mutex> lock1(mutexes[i]);
        printf_sync("thread%zu locked mutex%zu\n", i, i);
#ifdef DETERMINISTIC
        locked_own[i].signal();

        try_to_lock_next[i].wait();
#endif /* DETERMINISTIC */

        const size_t next = (i + 1) % n_threads;
        printf_sync("thread%zu trying to lock mutex%zu\n", i, next);
        std::unique_lock<std::mutex> lock2(mutexes[next]);
        printf_sync("thread%zu locked mutex%zu\n", i, next);
    };

    for (size_t i = 0; i < n_threads; ++i) {
        threads[i] = std::thread{thread, i};
    }

#ifdef DETERMINISTIC
    for (size_t i = 0; i < n_threads; ++i) {
        locked_own[i].wait();
    }

    for (size_t i = 0; i < n_threads; ++i) {
        try_to_lock_next[i].signal();
    }
#endif /* DETERMINISTIC */

    for (size_t i = 0; i < n_threads; ++i) {
        threads[i].join();
    }

    return 0;
}
dvyukov commented 4 years ago

Hi @vasild,

I agree that the condition variable is unrelated to the deadlock and should not affect if the deadlock is reported or not.

We had 2 different implementation of deadlock detector engine and if I switch from one to another with:

--- a/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector_interface.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector_interface.h
@@ -16,7 +16,7 @@
 #define SANITIZER_DEADLOCK_DETECTOR_INTERFACE_H

 #ifndef SANITIZER_DEADLOCK_DETECTOR_VERSION
-# define SANITIZER_DEADLOCK_DETECTOR_VERSION 1
+# define SANITIZER_DEADLOCK_DETECTOR_VERSION 2
 #endif

The deadlock is well detected without and with -DDETERMINISTIC=1 (just before it really deadlocks):

==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=214713)
  Cycle in lock order graph: M0 (0x000000000000) => M4 (0x000000000000) => M7 (0x000000000000) => M0

  Mutex M4 acquired here while holding mutex M0 in thread T1:
    #0 pthread_mutex_lock compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:4150:3 (a.out+0x4675f6)
    #1 __gthread_mutex_lock(pthread_mutex_t*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749:12 (a.out+0x4bc156)
    #2 std::mutex::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_mutex.h:100:17 (a.out+0x4bdcc8)
    #3 std::unique_lock<std::mutex>::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:141:17 (a.out+0x4bdc53)
    #4 std::unique_lock<std::mutex>::unique_lock(std::mutex&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:71:2 (a.out+0x4bd000)
    #5 main::$_0::operator()(unsigned long) const /tmp/test.cc:71:38 (a.out+0x4bcb4a)
    #6 void std::__invoke_impl<void, main::$_0, unsigned long>(std::__invoke_other, main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:60:14 (a.out+0x4bcabd)
    #7 std::__invoke_result<main::$_0, unsigned long>::type std::__invoke<main::$_0, unsigned long>(main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95:14 (a.out+0x4bc94d)
    #8 void std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244:13 (a.out+0x4bc8d8)
    #9 std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251:11 (a.out+0x4bc858)
    #10 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0, unsigned long> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195:13 (a.out+0x4bc59f)
    #11 <null> <null> (libstdc++.so.6+0xceb9f)

  Mutex M0 previously acquired by the same thread here:
    #0 pthread_mutex_lock compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:4150:3 (a.out+0x4675f6)
    #1 __gthread_mutex_lock(pthread_mutex_t*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749:12 (a.out+0x4bc156)
    #2 std::mutex::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_mutex.h:100:17 (a.out+0x4bdcc8)
    #3 std::unique_lock<std::mutex>::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:141:17 (a.out+0x4bdc53)
    #4 std::unique_lock<std::mutex>::unique_lock(std::mutex&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:71:2 (a.out+0x4bd000)
    #5 main::$_0::operator()(unsigned long) const /tmp/test.cc:81:38 (a.out+0x4bcc40)
    #6 void std::__invoke_impl<void, main::$_0, unsigned long>(std::__invoke_other, main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:60:14 (a.out+0x4bcabd)
    #7 std::__invoke_result<main::$_0, unsigned long>::type std::__invoke<main::$_0, unsigned long>(main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95:14 (a.out+0x4bc94d)
    #8 void std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244:13 (a.out+0x4bc8d8)
    #9 std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251:11 (a.out+0x4bc858)
    #10 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0, unsigned long> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195:13 (a.out+0x4bc59f)
    #11 <null> <null> (libstdc++.so.6+0xceb9f)

  Mutex M7 acquired here while holding mutex M4 in thread T2:
    #0 pthread_mutex_lock compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:4150:3 (a.out+0x4675f6)
    #1 __gthread_mutex_lock(pthread_mutex_t*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749:12 (a.out+0x4bc156)
    #2 std::mutex::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_mutex.h:100:17 (a.out+0x4bdcc8)
    #3 std::unique_lock<std::mutex>::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:141:17 (a.out+0x4bdc53)
    #4 std::unique_lock<std::mutex>::unique_lock(std::mutex&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:71:2 (a.out+0x4bd000)
    #5 main::$_0::operator()(unsigned long) const /tmp/test.cc:71:38 (a.out+0x4bcb4a)
    #6 void std::__invoke_impl<void, main::$_0, unsigned long>(std::__invoke_other, main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:60:14 (a.out+0x4bcabd)
    #7 std::__invoke_result<main::$_0, unsigned long>::type std::__invoke<main::$_0, unsigned long>(main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95:14 (a.out+0x4bc94d)
    #8 void std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244:13 (a.out+0x4bc8d8)
    #9 std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251:11 (a.out+0x4bc858)
    #10 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0, unsigned long> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195:13 (a.out+0x4bc59f)
    #11 <null> <null> (libstdc++.so.6+0xceb9f)

  Mutex M4 previously acquired by the same thread here:
    #0 pthread_mutex_lock compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:4150:3 (a.out+0x4675f6)
    #1 __gthread_mutex_lock(pthread_mutex_t*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749:12 (a.out+0x4bc156)
    #2 std::mutex::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_mutex.h:100:17 (a.out+0x4bdcc8)
    #3 std::unique_lock<std::mutex>::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:141:17 (a.out+0x4bdc53)
    #4 std::unique_lock<std::mutex>::unique_lock(std::mutex&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:71:2 (a.out+0x4bd000)
    #5 main::$_0::operator()(unsigned long) const /tmp/test.cc:81:38 (a.out+0x4bcc40)
    #6 void std::__invoke_impl<void, main::$_0, unsigned long>(std::__invoke_other, main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:60:14 (a.out+0x4bcabd)
    #7 std::__invoke_result<main::$_0, unsigned long>::type std::__invoke<main::$_0, unsigned long>(main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95:14 (a.out+0x4bc94d)
    #8 void std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244:13 (a.out+0x4bc8d8)
    #9 std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251:11 (a.out+0x4bc858)
    #10 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0, unsigned long> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195:13 (a.out+0x4bc59f)
    #11 <null> <null> (libstdc++.so.6+0xceb9f)

  Mutex M0 acquired here while holding mutex M7 in thread T3:
    #0 pthread_mutex_lock compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:4150:3 (a.out+0x4675f6)
    #1 __gthread_mutex_lock(pthread_mutex_t*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749:12 (a.out+0x4bc156)
    #2 std::mutex::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_mutex.h:100:17 (a.out+0x4bdcc8)
    #3 std::unique_lock<std::mutex>::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:141:17 (a.out+0x4bdc53)
    #4 std::unique_lock<std::mutex>::unique_lock(std::mutex&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:71:2 (a.out+0x4bd000)
    #5 main::$_0::operator()(unsigned long) const /tmp/test.cc:71:38 (a.out+0x4bcb4a)
    #6 void std::__invoke_impl<void, main::$_0, unsigned long>(std::__invoke_other, main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:60:14 (a.out+0x4bcabd)
    #7 std::__invoke_result<main::$_0, unsigned long>::type std::__invoke<main::$_0, unsigned long>(main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95:14 (a.out+0x4bc94d)
    #8 void std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244:13 (a.out+0x4bc8d8)
    #9 std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251:11 (a.out+0x4bc858)
    #10 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0, unsigned long> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195:13 (a.out+0x4bc59f)
    #11 <null> <null> (libstdc++.so.6+0xceb9f)

  Mutex M7 previously acquired by the same thread here:
    #0 pthread_mutex_lock compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:4150:3 (a.out+0x4675f6)
    #1 __gthread_mutex_lock(pthread_mutex_t*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749:12 (a.out+0x4bc156)
    #2 std::mutex::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_mutex.h:100:17 (a.out+0x4bdcc8)
    #3 std::unique_lock<std::mutex>::lock() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:141:17 (a.out+0x4bdc53)
    #4 std::unique_lock<std::mutex>::unique_lock(std::mutex&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_lock.h:71:2 (a.out+0x4bd000)
    #5 main::$_0::operator()(unsigned long) const /tmp/test.cc:81:38 (a.out+0x4bcc40)
    #6 void std::__invoke_impl<void, main::$_0, unsigned long>(std::__invoke_other, main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:60:14 (a.out+0x4bcabd)
    #7 std::__invoke_result<main::$_0, unsigned long>::type std::__invoke<main::$_0, unsigned long>(main::$_0&&, unsigned long&&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95:14 (a.out+0x4bc94d)
    #8 void std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244:13 (a.out+0x4bc8d8)
    #9 std::thread::_Invoker<std::tuple<main::$_0, unsigned long> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251:11 (a.out+0x4bc858)
    #10 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0, unsigned long> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195:13 (a.out+0x4bc59f)
    #11 <null> <null> (libstdc++.so.6+0xceb9f)

  Thread T1 (tid=214715, running) created by main thread at:
    #0 pthread_create compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:965:3 (a.out+0x4494fb)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xcee14)
    #2 main /tmp/test.cc:86:22 (a.out+0x4bbdd1)

  Thread T2 (tid=214716, running) created by main thread at:
    #0 pthread_create compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:965:3 (a.out+0x4494fb)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xcee14)
    #2 main /tmp/test.cc:86:22 (a.out+0x4bbdd1)

  Thread T3 (tid=214717, running) created by main thread at:
    #0 pthread_create compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:965:3 (a.out+0x4494fb)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xcee14)
    #2 main /tmp/test.cc:86:22 (a.out+0x4bbdd1)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749:12 in __gthread_mutex_lock(pthread_mutex_t*)
==================

The default version 1 was written by @kcc but I am not sure anybody still remembers any limitations of version 1 that may affect this...

vasild commented 4 years ago

@dvyukov, thanks for looking into this. I wasn't aware of SANITIZER_DEADLOCK_DETECTOR_VERSION 2, is it fully operational? Could it be made the default one? Btw the reported addresses of the mutexes are 0x000000000000.

dvyukov commented 4 years ago

I don't know answers to these questions.

kcc commented 4 years ago

we are not actively working on either version of the deadlock detector :(

pudge62 commented 1 month ago

@vasild I found that there is a design flaw in the detector 1. The primary issue is that the detection is made in DD::MutexBeforeLock, but the lock edge isn't added to the global lock graph until DD::MutexAfterLock.

M0 -> (BEFORE LOCK: check deadlock pass) M1 (AFTER LOCK: deadlocked, add edge)
M1 -> (BEFORE LOCK: check deadlock pass) M2 (AFTER LOCK: deadlocked, add edge)
M2 -> (BEFORE LOCK: check deadlock pass) M0 (AFTER LOCK: deadlocked, add edge)

In contrast, the detector 2 adds the lock edge during DD::MutexBeforeLock, so it can work well.