google / sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer
Other
11.34k stars 1.02k forks source link

TSAN false positives with chromium clang toolchain #953

Open dionescu opened 6 years ago

dionescu commented 6 years ago

I've been trying to extend the Bazel toolchain at https://github.com/vsco/bazel-toolchains (based on Chromium clang toolchain) to add support for sanitizers. I had success with eliminating MSAN false positives after instrumenting libcxx but the same approach failed for TSAN. In particular, TSAN (with instrumented libcxx confirmed via nm) reports races when I try to run Abseil (https://github.com/abseil/abseil-cpp) synchronization tests. I added one report below (sorry not sure how to get line numbers to work).

Any ideas what may be happening here?

WARNING: ThreadSanitizer: data race (pid=15)
  Atomic write of size 4 at 0x7f0f827de150 by main thread:
    #0 __tsan_atomic32_fetch_add /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:616 (call_once_test+0x92540)
    #1 absl::synchronization_internal::Waiter::Post() ??:? (call_once_test+0xbd6d3)
    #2 AbslInternalPerThreadSemPost ??:? (call_once_test+0xbd391)
    #3 absl::synchronization_internal::PerThreadSem::Post(absl::base_internal::ThreadIdentity*) ??:? (call_once_test+0xbc739)
    #4 absl::Mutex::IncrementSynchSem(absl::Mutex*, absl::base_internal::PerThreadSynch*) ??:? (call_once_test+0xbbf24)
    #5 absl::Mutex::Wakeup(absl::base_internal::PerThreadSynch*) ??:? (call_once_test+0xb6de5)
    #6 absl::Mutex::UnlockSlow(absl::SynchWaitParams*) ??:? (call_once_test+0xb86f2)
    #7 absl::Mutex::Unlock() ??:? (call_once_test+0xb94ca)
    #8 absl::(anonymous namespace)::CallOnceTest_ExecutionCount_Test::TestBody() call_once_test.cc:? (call_once_test+0xb2c0a)
    #9 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (call_once_test+0x108f17)
    #10 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (call_once_test+0xfc355)
    #11 testing::Test::Run() ??:? (call_once_test+0xeaca7)
    #12 testing::TestInfo::Run() ??:? (call_once_test+0xeb4b2)
    #13 testing::TestCase::Run() ??:? (call_once_test+0xebbd1)
    #14 testing::internal::UnitTestImpl::RunAllTests() ??:? (call_once_test+0xf55b1)
    #15 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (call_once_test+0x10a7c7)
    #16 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (call_once_test+0xfe165)
    #17 testing::UnitTest::Run() ??:? (call_once_test+0xf52a0)
    #18 RUN_ALL_TESTS() ??:? (call_once_test+0xdaffb)
    #19 main ??:? (call_once_test+0xdafa8)

  Previous write of size 8 at 0x7f0f827de150 by thread T1:
    #0 memset /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:711 (call_once_test+0x565f9)
    #1 absl::synchronization_internal::NewThreadIdentity() create_thread_identity.cc:? (call_once_test+0xbd0b7)
    #2 absl::synchronization_internal::CreateThreadIdentity() ??:? (call_once_test+0xbcfe3)
    #3 absl::synchronization_internal::GetOrCreateCurrentThreadIdentity() ??:? (call_once_test+0xbc4bf)
    #4 absl::Synch_GetPerThread() mutex.cc:? (call_once_test+0xbaf33)
    #5 absl::Synch_GetPerThreadAnnotated(absl::Mutex*) mutex.cc:? (call_once_test+0xb7d76)
    #6 absl::Mutex::LockSlowWithDeadline(absl::MuHowS const*, absl::Condition const*, absl::synchronization_internal::KernelTimeout, int) ??:? (call_once_test+0xb7726)
    #7 absl::Mutex::LockSlow(absl::MuHowS const*, absl::Condition const*, int) ??:? (call_once_test+0xb72e4)
    #8 absl::Mutex::LockWhen(absl::Condition const&) ??:? (call_once_test+0xb74a1)
    #9 absl::(anonymous namespace)::WaitAndIncrement() call_once_test.cc:? (call_once_test+0xb3044)
    #10 _ZN4absl13base_internal8Callable6InvokeIRFvvEJEEEDTclclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT0_EEEEOS5_DpOS6_ ??:? (call_once_test+0xb43c3)
    #11 _ZN4absl13base_internal6InvokeIRFvvEJEEEDTclsr7InvokerIT_DpT0_E4typeE6Invokeclsr3stdE7declvalIS4_EEspclsr3stdE7declvalIS5_EEEEOS4_DpOS5_ ??:? (call_once_test+0xb42f9)
    #12 void absl::base_internal::CallOnceImpl<void (&)()>(std::__1::atomic<unsigned int>*, absl::base_internal::SchedulingMode, void (&)()) ??:? (call_once_test+0xb4218)
    #13 void absl::call_once<void (&)()>(absl::once_flag&, void (&)()) ??:? (call_once_test+0xb4130)
    #14 absl::(anonymous namespace)::ThreadBody() call_once_test.cc:? (call_once_test+0xb2f34)
    #15 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()> >(void*) ??:? (call_once_test+0xb3938)

  Thread T1 (tid=18, running) created by main thread at:
    #0 pthread_create /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:921 (call_once_test+0x4ce43)
    #1 std::__1::thread::thread<void (&)(), , void>(void (&)()) crtstuff.c:? (call_once_test+0xb36cc)
    #2 void std::__1::vector<std::__1::thread, std::__1::allocator<std::__1::thread> >::__emplace_back_slow_path<void (&)()>(void (&)()) ??:? (call_once_test+0xb355f)
    #3 absl::(anonymous namespace)::CallOnceTest_ExecutionCount_Test::TestBody() call_once_test.cc:? (call_once_test+0xb29ee)
    #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (call_once_test+0x108f17)
    #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (call_once_test+0xfc355)
    #6 testing::Test::Run() ??:? (call_once_test+0xeaca7)
    #7 testing::TestInfo::Run() ??:? (call_once_test+0xeb4b2)
    #8 testing::TestCase::Run() ??:? (call_once_test+0xebbd1)
    #9 testing::internal::UnitTestImpl::RunAllTests() ??:? (call_once_test+0xf55b1)
    #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (call_once_test+0x10a7c7)
    #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (call_once_test+0xfe165)
    #12 testing::UnitTest::Run() ??:? (call_once_test+0xf52a0)
    #13 RUN_ALL_TESTS() ??:? (call_once_test+0xdaffb)
    #14 main ??:? (call_once_test+0xdafa8)

SUMMARY: ThreadSanitizer: data race ??:? in absl::synchronization_internal::Waiter::Post()
dvyukov commented 6 years ago

not sure how to get line numbers to work

You need to add llvm-symbolizer binary to path. Maybe there is also a runtime flag that can be used to specify binary path. And also of course compile with -g.

dvyukov commented 6 years ago

I see that absl/base/internal/tsan_mutex_interface.h contains #if defined(THREAD_SANITIZER). Have you defined it? Perhaps absl needs auto-detect tsan. THREAD_SANITIZER macro is Google-ism. Can you bring this up with absl people?

dvyukov commented 6 years ago

But! This can also be a real bug in absl mutex. Can you obtain a report with line numbers?

dionescu commented 6 years ago

I tried everything you recommended to get line numbers but was not successful. The llvm-symbolizer is in my path, I compile with -g, and I even set TSAN_SYMBOLIZER_PATH and TSAN_OPTIONS=symbolize=1. I also tried replacing llvm-symbolizer with a dummy script and it appears it does not get called at all. What is the mechanism that invokes the symbolizer out of process? Do I need to set some rpath?

For absl, it does indeed have some logic that depends on THREAD_SANITIZER being defined. I do pass -DTHREAD_SANITIZER when I compiled and I did check by adding a line to absl that it auto-detects TSAN support correctly.

dionescu commented 6 years ago

What the threads are doing:

T0(main) is unlocking a mutex here:
  // Allow WaitAndIncrement to finish executing.  Once it does, the other
  // call_once waiters will be unblocked.
  done_blocking = true;
  counters_mu.Unlock();

T1 (created by main) is doing a LockWhen in WaitAndIncrement:
void WaitAndIncrement() {
  counters_mu.Lock();
  ++call_once_invoke_count;
  counters_mu.Unlock();

  counters_mu.LockWhen(Condition(&done_blocking));
  ++call_once_finished_count;
  counters_mu.Unlock();
}

Nothing wrong with the test as far as I can tell. The race seems to trigger in absl' s mutex internals where we do 2 writes of different sizes (4 and 8 bytes).

Maybe I'm doing something wrong with how I use / Bazel configure the chromium clang toolchain. Would there be anything to look out for?

dionescu commented 6 years ago

Zooming into mutex internals:

T1 calls LockWhen which is where its associated thread identity gets created. The memset in the trace sets this identity object to 0.

T0(main) unlocks the mutex and by the time this happens some identities were created with associated waker objects. T0 notifies the waker objects of the Unlock which is a write to waker and thus to thread_identity.

I'm probably missing something but I don't see how the memset write is synchronized with the write to waker state which is part of the identity.

dvyukov commented 6 years ago

Does symbolization work for the simplest cases? It's not a good idea to throw in all possible unknowns into an equation from the beginning. E.g. does the following work for you?

$ cat /tmp/race.c

#include <pthread.h>

int Global;

void *Thread1(void *x) {
  Global = 42;
  return NULL;
}

void *Thread2(void *x) {
  Global = 43;
  return NULL;
}

int main() {
  pthread_t t[2];
  pthread_create(&t[0], NULL, Thread1, NULL);
  pthread_create(&t[1], NULL, Thread2, NULL);
  pthread_join(t[0], NULL);
  pthread_join(t[1], NULL);
  return 0;
}

$ clang /tmp/race.c -fsanitize=thread -g
$ ./a.out

==================
WARNING: ThreadSanitizer: data race (pid=33868)
  Write of size 4 at 0x000001119220 by thread T2:
    #0 Thread2 /tmp/race.c:11:10 (a.out+0x4bd271)

  Previous write of size 4 at 0x000001119220 by thread T1:
    #0 Thread1 /tmp/race.c:6:10 (a.out+0x4bd221)

  Location is global 'Global' of size 4 at 0x000001119220 (a.out+0x000001119220)

  Thread T2 (tid=33871, running) created by main thread at:
    #0 pthread_create tsan_interceptors.cc:965 (a.out+0x44f326)
    #1 main /tmp/race.c:18:3 (a.out+0x4bd2f3)

  Thread T1 (tid=33870, finished) created by main thread at:
    #0 pthread_create tsan_interceptors.cc:965 (a.out+0x44f326)
    #1 main /tmp/race.c:17:3 (a.out+0x4bd2cd)

SUMMARY: ThreadSanitizer: data race /tmp/race.c:11:10 in Thread2
==================
dvyukov commented 6 years ago

Does adding report_atomic_races=0 to TSAN_OPTIONS prevent these races? This may be a false positives related to intricate dances trying to ignore mutex guts but at the same time not producing false positives. Internally we run with report_atomic_races=0 and that can explain why we didn't hit this before.

dionescu commented 6 years ago

Thanks for the help with symbolization. Turns out I had a typo in my blazerc file (s/asan/tsan) which prevented the -g compile option to be added in tsan mode. I can get line numbers now:

WARNING: ThreadSanitizer: data race (pid=15)
  Atomic write of size 4 at 0x7f1042500350 by thread T10:
    #0 __tsan_atomic32_fetch_add /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:616:3 (call_once_test+0x936b0)
    #1 std::__1::__atomic_base<int, true>::fetch_add(int, std::__1::memory_order) /usr/xclang/include/c++/v1/atomic:1017:17 (call_once_test+0xbec53)
    #2 absl::synchronization_internal::Waiter::Post() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/waiter.cc:155 (call_once_test+0xbec53)
    #3 AbslInternalPerThreadSemPost /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/per_thread_sem.cc:67:64 (call_once_test+0xbe8f1)
    #4 absl::synchronization_internal::PerThreadSem::Post(absl::base_internal::ThreadIdentity*) /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/per_thread_sem.h:100:3 (call_once_test+0xbdc29)
    #5 absl::Mutex::IncrementSynchSem(absl::Mutex*, absl::base_internal::PerThreadSynch*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:531:3 (call_once_test+0xbd3f4)
    #6 absl::Mutex::Wakeup(absl::base_internal::PerThreadSynch*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1118:3 (call_once_test+0xb825d)
    #7 absl::Mutex::UnlockSlow(absl::SynchWaitParams*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:2267:19 (call_once_test+0xb9b92)
    #8 absl::Mutex::Unlock() /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1674:11 (call_once_test+0xba96a)
    #9 absl::(anonymous namespace)::ThreadBody() /proc/self/cwd/external/com_google_absl/absl/base/call_once_test.cc:50:15 (call_once_test+0xb4091)
    #10 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()>&, std::__1::__tuple_indices<>) /usr/xclang/include/c++/v1/type_traits:4323:1 (call_once_test+0xb4af8)
    #11 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()> >(void*) /usr/xclang/include/c++/v1/thread:352 (call_once_test+0xb4af8)

  Previous write of size 8 at 0x7f1042500350 by main thread:
    #0 memset /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:711:3 (call_once_test+0x57769)
    #1 absl::synchronization_internal::NewThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.cc:93:3 (call_once_test+0xbe5f7)
    #2 absl::synchronization_internal::CreateThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.cc:102:45 (call_once_test+0xbe523)
    #3 absl::synchronization_internal::GetOrCreateCurrentThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.h:46:12 (call_once_test+0xbd9af)
    #4 absl::Synch_GetPerThread() /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:503:30 (call_once_test+0xbc403)
    #5 absl::Synch_GetPerThreadAnnotated(absl::Mutex*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:511:23 (call_once_test+0xb9216)
    #6 absl::Mutex::LockSlowWithDeadline(absl::MuHowS const*, absl::Condition const*, absl::synchronization_internal::KernelTimeout, int) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1818:42 (call_once_test+0xb8ba6)
    #7 absl::Mutex::LockSlow(absl::MuHowS const*, absl::Condition const*, int) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1729:3 (call_once_test+0xb8764)
    #8 absl::Mutex::LockWhen(absl::Condition const&) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1468:9 (call_once_test+0xb8921)
    #9 absl::(anonymous namespace)::CallOnceTest_ExecutionCount_Test::TestBody() /proc/self/cwd/external/com_google_absl/absl/base/call_once_test.cc:76:15 (call_once_test+0xb3b8a)
    #10 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2417:10 (call_once_test+0x114a67)
    #11 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2453:14 (call_once_test+0x1072b5)
    #12 testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2491:5 (call_once_test+0xf5a97)
    #13 testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2667:11 (call_once_test+0xf62a2)
    #14 testing::TestCase::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2785:28 (call_once_test+0xf69c1)
    #15 testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5047:43 (call_once_test+0x1003b1)
    #16 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2417:10 (call_once_test+0x116537)
    #17 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2453:14 (call_once_test+0x109165)
    #18 testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:4663:10 (call_once_test+0x1000a0)
    #19 RUN_ALL_TESTS() /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2329:46 (call_once_test+0xe5cfb)
    #20 main /proc/self/cwd/external/com_google_googletest/googlemock/src/gmock_main.cc:53:10 (call_once_test+0xe5ca8)
dionescu commented 6 years ago

Also, adding report_atomic_races=0 to TSAN_OPTIONS prevents the race reports.

dvyukov commented 6 years ago

Also, adding report_atomic_races=0 to TSAN_OPTIONS prevents the race reports.

Then I think hypothesis in https://github.com/google/sanitizers/issues/953#issuecomment-387652366 is correct. Unfortunately, I am not sure we will have time to fix this in near future. +@kcc Is it possible to provide default TSAN_OPTIONS in bazel, or compile in:

extern "C" const char* __tsan_default_options() {
  return " report_atomic_races=0 ";
}

?

dionescu commented 6 years ago

For others running into this issue, one way to silence atomic races reports is by adding the following to your .bazelrc (assuming you run TSAN with --config=tsan):

build:tsan --test_env=TSAN_OPTIONS=report_atomic_races=0

jlisee commented 5 years ago

It looks like on Mac there is a specific work around in tsan for std::call_once in libc++.dylib and a STDCXX_INTERCEPTOR. Is something like this needed for more platforms or is this just specific to how things are built on Mac?

devjgm commented 3 years ago

To be clear, adding report_atomic_races=0 to TSAN_OPTIONS should silence the report. But is it a legit data race that we're silencing? Or is it just a false positive?

dvyukov commented 3 years ago

I think this is a false positive related to the fact that we specifically ignore most synchronization in Mutex code: https://source.chromium.org/chromium/chromium/src/+/master:third_party/abseil-cpp/absl/synchronization/mutex.cc;l=170;drc=b6496a2a60ea1712e46c32eb1be8d2e03d99af8b;bpv=1;bpt=1 for the sake of better race reports for the rest of the code. Unfortunately this support was developed based on a code base that uses report_atomic_races=0, thus the false positive.

mpoeter commented 2 years ago

Actually I think that this is an actual race, and TSan is right to report it! I still need to dig a bit deeper, but from what I saw so far there simply is no proper happens-before relation between the initialization of the Waiter (which uses some non-atomic writes to initialize the memory) and the later fetch-add. The fact that the Waiter ctor uses an atomic store to explicitly initialize the futex is irrelevant. This reminds me very much of this issue where we concluded that it actually is a race: https://github.com/google/sanitizers/issues/1009

prigluchenie commented 1 year ago

I have a similar problem with my grpc-project. I have a suspicion that the problem is related to the abseil-cpp fixed bug https://github.com/abseil/abseil-cpp/commit/9444b11e0c4e1f079c87067b5bbab1c5ff718809

MBkkt commented 1 year ago

https://github.com/abseil/abseil-cpp/commit/bb7be494b385975ebdeaac3a8ee10894405641e4 After this commit in abseil was removed suppressions. https://github.com/abseil/abseil-cpp/commit/091842beeac564776c231e7d1787b527d1c2d0d9

I do pretty same for our project and now it works, so I think this issue at least for library users is resolved