Open dionescu opened 6 years ago
not sure how to get line numbers to work
You need to add llvm-symbolizer binary to path. Maybe there is also a runtime flag that can be used to specify binary path. And also of course compile with -g
.
I see that absl/base/internal/tsan_mutex_interface.h
contains #if defined(THREAD_SANITIZER)
. Have you defined it? Perhaps absl needs auto-detect tsan. THREAD_SANITIZER
macro is Google-ism. Can you bring this up with absl people?
But! This can also be a real bug in absl mutex. Can you obtain a report with line numbers?
I tried everything you recommended to get line numbers but was not successful. The llvm-symbolizer is in my path, I compile with -g, and I even set TSAN_SYMBOLIZER_PATH and TSAN_OPTIONS=symbolize=1. I also tried replacing llvm-symbolizer with a dummy script and it appears it does not get called at all. What is the mechanism that invokes the symbolizer out of process? Do I need to set some rpath?
For absl, it does indeed have some logic that depends on THREAD_SANITIZER being defined. I do pass -DTHREAD_SANITIZER when I compiled and I did check by adding a line to absl that it auto-detects TSAN support correctly.
What the threads are doing:
T0(main) is unlocking a mutex here:
// Allow WaitAndIncrement to finish executing. Once it does, the other
// call_once waiters will be unblocked.
done_blocking = true;
counters_mu.Unlock();
T1 (created by main) is doing a LockWhen in WaitAndIncrement:
void WaitAndIncrement() {
counters_mu.Lock();
++call_once_invoke_count;
counters_mu.Unlock();
counters_mu.LockWhen(Condition(&done_blocking));
++call_once_finished_count;
counters_mu.Unlock();
}
Nothing wrong with the test as far as I can tell. The race seems to trigger in absl' s mutex internals where we do 2 writes of different sizes (4 and 8 bytes).
Maybe I'm doing something wrong with how I use / Bazel configure the chromium clang toolchain. Would there be anything to look out for?
Zooming into mutex internals:
T1 calls LockWhen which is where its associated thread identity gets created. The memset in the trace sets this identity object to 0.
T0(main) unlocks the mutex and by the time this happens some identities were created with associated waker objects. T0 notifies the waker objects of the Unlock which is a write to waker and thus to thread_identity.
I'm probably missing something but I don't see how the memset write is synchronized with the write to waker state which is part of the identity.
Does symbolization work for the simplest cases? It's not a good idea to throw in all possible unknowns into an equation from the beginning. E.g. does the following work for you?
$ cat /tmp/race.c
#include <pthread.h>
int Global;
void *Thread1(void *x) {
Global = 42;
return NULL;
}
void *Thread2(void *x) {
Global = 43;
return NULL;
}
int main() {
pthread_t t[2];
pthread_create(&t[0], NULL, Thread1, NULL);
pthread_create(&t[1], NULL, Thread2, NULL);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
return 0;
}
$ clang /tmp/race.c -fsanitize=thread -g
$ ./a.out
==================
WARNING: ThreadSanitizer: data race (pid=33868)
Write of size 4 at 0x000001119220 by thread T2:
#0 Thread2 /tmp/race.c:11:10 (a.out+0x4bd271)
Previous write of size 4 at 0x000001119220 by thread T1:
#0 Thread1 /tmp/race.c:6:10 (a.out+0x4bd221)
Location is global 'Global' of size 4 at 0x000001119220 (a.out+0x000001119220)
Thread T2 (tid=33871, running) created by main thread at:
#0 pthread_create tsan_interceptors.cc:965 (a.out+0x44f326)
#1 main /tmp/race.c:18:3 (a.out+0x4bd2f3)
Thread T1 (tid=33870, finished) created by main thread at:
#0 pthread_create tsan_interceptors.cc:965 (a.out+0x44f326)
#1 main /tmp/race.c:17:3 (a.out+0x4bd2cd)
SUMMARY: ThreadSanitizer: data race /tmp/race.c:11:10 in Thread2
==================
Does adding report_atomic_races=0
to TSAN_OPTIONS
prevent these races?
This may be a false positives related to intricate dances trying to ignore mutex guts but at the same time not producing false positives. Internally we run with report_atomic_races=0
and that can explain why we didn't hit this before.
Thanks for the help with symbolization. Turns out I had a typo in my blazerc file (s/asan/tsan) which prevented the -g compile option to be added in tsan mode. I can get line numbers now:
WARNING: ThreadSanitizer: data race (pid=15)
Atomic write of size 4 at 0x7f1042500350 by thread T10:
#0 __tsan_atomic32_fetch_add /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:616:3 (call_once_test+0x936b0)
#1 std::__1::__atomic_base<int, true>::fetch_add(int, std::__1::memory_order) /usr/xclang/include/c++/v1/atomic:1017:17 (call_once_test+0xbec53)
#2 absl::synchronization_internal::Waiter::Post() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/waiter.cc:155 (call_once_test+0xbec53)
#3 AbslInternalPerThreadSemPost /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/per_thread_sem.cc:67:64 (call_once_test+0xbe8f1)
#4 absl::synchronization_internal::PerThreadSem::Post(absl::base_internal::ThreadIdentity*) /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/per_thread_sem.h:100:3 (call_once_test+0xbdc29)
#5 absl::Mutex::IncrementSynchSem(absl::Mutex*, absl::base_internal::PerThreadSynch*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:531:3 (call_once_test+0xbd3f4)
#6 absl::Mutex::Wakeup(absl::base_internal::PerThreadSynch*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1118:3 (call_once_test+0xb825d)
#7 absl::Mutex::UnlockSlow(absl::SynchWaitParams*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:2267:19 (call_once_test+0xb9b92)
#8 absl::Mutex::Unlock() /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1674:11 (call_once_test+0xba96a)
#9 absl::(anonymous namespace)::ThreadBody() /proc/self/cwd/external/com_google_absl/absl/base/call_once_test.cc:50:15 (call_once_test+0xb4091)
#10 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()>&, std::__1::__tuple_indices<>) /usr/xclang/include/c++/v1/type_traits:4323:1 (call_once_test+0xb4af8)
#11 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()> >(void*) /usr/xclang/include/c++/v1/thread:352 (call_once_test+0xb4af8)
Previous write of size 8 at 0x7f1042500350 by main thread:
#0 memset /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:711:3 (call_once_test+0x57769)
#1 absl::synchronization_internal::NewThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.cc:93:3 (call_once_test+0xbe5f7)
#2 absl::synchronization_internal::CreateThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.cc:102:45 (call_once_test+0xbe523)
#3 absl::synchronization_internal::GetOrCreateCurrentThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.h:46:12 (call_once_test+0xbd9af)
#4 absl::Synch_GetPerThread() /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:503:30 (call_once_test+0xbc403)
#5 absl::Synch_GetPerThreadAnnotated(absl::Mutex*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:511:23 (call_once_test+0xb9216)
#6 absl::Mutex::LockSlowWithDeadline(absl::MuHowS const*, absl::Condition const*, absl::synchronization_internal::KernelTimeout, int) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1818:42 (call_once_test+0xb8ba6)
#7 absl::Mutex::LockSlow(absl::MuHowS const*, absl::Condition const*, int) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1729:3 (call_once_test+0xb8764)
#8 absl::Mutex::LockWhen(absl::Condition const&) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1468:9 (call_once_test+0xb8921)
#9 absl::(anonymous namespace)::CallOnceTest_ExecutionCount_Test::TestBody() /proc/self/cwd/external/com_google_absl/absl/base/call_once_test.cc:76:15 (call_once_test+0xb3b8a)
#10 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2417:10 (call_once_test+0x114a67)
#11 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2453:14 (call_once_test+0x1072b5)
#12 testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2491:5 (call_once_test+0xf5a97)
#13 testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2667:11 (call_once_test+0xf62a2)
#14 testing::TestCase::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2785:28 (call_once_test+0xf69c1)
#15 testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5047:43 (call_once_test+0x1003b1)
#16 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2417:10 (call_once_test+0x116537)
#17 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2453:14 (call_once_test+0x109165)
#18 testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:4663:10 (call_once_test+0x1000a0)
#19 RUN_ALL_TESTS() /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2329:46 (call_once_test+0xe5cfb)
#20 main /proc/self/cwd/external/com_google_googletest/googlemock/src/gmock_main.cc:53:10 (call_once_test+0xe5ca8)
Also, adding report_atomic_races=0 to TSAN_OPTIONS prevents the race reports.
Also, adding report_atomic_races=0 to TSAN_OPTIONS prevents the race reports.
Then I think hypothesis in https://github.com/google/sanitizers/issues/953#issuecomment-387652366 is correct. Unfortunately, I am not sure we will have time to fix this in near future. +@kcc
Is it possible to provide default TSAN_OPTIONS
in bazel, or compile in:
extern "C" const char* __tsan_default_options() {
return " report_atomic_races=0 ";
}
?
For others running into this issue, one way to silence atomic races reports is by adding the following to your .bazelrc (assuming you run TSAN with --config=tsan):
build:tsan --test_env=TSAN_OPTIONS=report_atomic_races=0
It looks like on Mac there is a specific work around in tsan for std::call_once
in libc++.dylib
and a STDCXX_INTERCEPTOR
. Is something like this needed for more platforms or is this just specific to how things are built on Mac?
To be clear, adding report_atomic_races=0
to TSAN_OPTIONS should silence the report. But is it a legit data race that we're silencing? Or is it just a false positive?
I think this is a false positive related to the fact that we specifically ignore most synchronization in Mutex code: https://source.chromium.org/chromium/chromium/src/+/master:third_party/abseil-cpp/absl/synchronization/mutex.cc;l=170;drc=b6496a2a60ea1712e46c32eb1be8d2e03d99af8b;bpv=1;bpt=1 for the sake of better race reports for the rest of the code. Unfortunately this support was developed based on a code base that uses report_atomic_races=0, thus the false positive.
Actually I think that this is an actual race, and TSan is right to report it! I still need to dig a bit deeper, but from what I saw so far there simply is no proper happens-before relation between the initialization of the Waiter (which uses some non-atomic writes to initialize the memory) and the later fetch-add. The fact that the Waiter ctor uses an atomic store to explicitly initialize the futex is irrelevant. This reminds me very much of this issue where we concluded that it actually is a race: https://github.com/google/sanitizers/issues/1009
I have a similar problem with my grpc-project. I have a suspicion that the problem is related to the abseil-cpp fixed bug https://github.com/abseil/abseil-cpp/commit/9444b11e0c4e1f079c87067b5bbab1c5ff718809
https://github.com/abseil/abseil-cpp/commit/bb7be494b385975ebdeaac3a8ee10894405641e4 After this commit in abseil was removed suppressions. https://github.com/abseil/abseil-cpp/commit/091842beeac564776c231e7d1787b527d1c2d0d9
I do pretty same for our project and now it works, so I think this issue at least for library users is resolved
I've been trying to extend the Bazel toolchain at https://github.com/vsco/bazel-toolchains (based on Chromium clang toolchain) to add support for sanitizers. I had success with eliminating MSAN false positives after instrumenting libcxx but the same approach failed for TSAN. In particular, TSAN (with instrumented libcxx confirmed via nm) reports races when I try to run Abseil (https://github.com/abseil/abseil-cpp) synchronization tests. I added one report below (sorry not sure how to get line numbers to work).
Any ideas what may be happening here?