project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.53k stars 2.02k forks source link

Copious TSAN warnings on Linux #14710

Open mrjerryjohns opened 2 years ago

mrjerryjohns commented 2 years ago

Problem

We have lots and lots of TSAN data race warnings on Linux that should be addressed before it results in other potentially bigger blow-ups that are more difficult to debug later.

Example: https://github.com/project-chip/connectedhomeip/runs/5039936950?check_suite_focus=true

bzbarsky-apple commented 2 years ago
Example output: ``` 2022-02-03 02:03:15.802 INFO WARNING: ThreadSanitizer: data race (pid=31995) 2022-02-03 02:03:15.802 INFO Write of size 8 at 0x7b5000010700 by thread T4: 2022-02-03 02:03:15.803 INFO #0 memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:795 (libtsan.so.0+0x3790f) 2022-02-03 02:03:15.803 INFO #1 memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:793 (libtsan.so.0+0x3790f) 2022-02-03 02:03:15.804 INFO #2 g_slice_alloc0 (libglib-2.0.so.0+0x70ac2) 2022-02-03 02:03:15.804 INFO 2022-02-03 02:03:15.805 INFO Previous write of size 8 at 0x7b5000010700 by thread T2: 2022-02-03 02:03:15.805 INFO #0 posix_memalign ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:835 (libtsan.so.0+0x3046d) 2022-02-03 02:03:15.805 INFO #1 (libglib-2.0.so.0+0x6f936) 2022-02-03 02:03:15.806 INFO #2 void std::__invoke_impl(std::__invoke_other, void (*&&)()) /usr/include/c++/9/bits/invoke.h:60 (chip-tv-app+0x1c9585) 2022-02-03 02:03:15.806 INFO #3 std::__invoke_result::type std::__invoke(void (*&&)()) /usr/include/c++/9/bits/invoke.h:95 (chip-tv-app+0x1c94e2) 2022-02-03 02:03:15.806 INFO #4 void std::thread::_Invoker >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/9/thread:244 (chip-tv-app+0x1c9445) 2022-02-03 02:03:15.807 INFO #5 std::thread::_Invoker >::operator()() /usr/include/c++/9/thread:251 (chip-tv-app+0x1c93f2) 2022-02-03 02:03:15.807 INFO #6 std::thread::_State_impl > >::_M_run() /usr/include/c++/9/thread:195 (chip-tv-app+0x1c93b3) 2022-02-03 02:03:15.807 INFO #7 (libstdc++.so.6+0xd6de3) 2022-02-03 02:03:15.807 INFO 2022-02-03 02:03:15.808 INFO Location is heap block of size 496 at 0x7b5000010600 allocated by thread T2: 2022-02-03 02:03:15.808 INFO #0 posix_memalign ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:835 (libtsan.so.0+0x3046d) 2022-02-03 02:03:15.808 INFO #1 (libglib-2.0.so.0+0x6f936) 2022-02-03 02:03:15.808 INFO #2 void std::__invoke_impl(std::__invoke_other, void (*&&)()) /usr/include/c++/9/bits/invoke.h:60 (chip-tv-app+0x1c9585) 2022-02-03 02:03:15.808 INFO #3 std::__invoke_result::type std::__invoke(void (*&&)()) /usr/include/c++/9/bits/invoke.h:95 (chip-tv-app+0x1c94e2) 2022-02-03 02:03:15.808 INFO #4 void std::thread::_Invoker >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/9/thread:244 (chip-tv-app+0x1c9445) 2022-02-03 02:03:15.809 INFO #5 std::thread::_Invoker >::operator()() /usr/include/c++/9/thread:251 (chip-tv-app+0x1c93f2) 2022-02-03 02:03:15.809 INFO #6 std::thread::_State_impl > >::_M_run() /usr/include/c++/9/thread:195 (chip-tv-app+0x1c93b3) 2022-02-03 02:03:15.809 INFO #7 (libstdc++.so.6+0xd6de3) 2022-02-03 02:03:15.809 INFO 2022-02-03 02:03:15.809 INFO Thread T4 'pool' (tid=32000, running) created by main thread at: 2022-02-03 02:03:15.810 INFO #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79) 2022-02-03 02:03:15.810 INFO #1 (libglib-2.0.so.0+0x9f2ba) 2022-02-03 02:03:15.810 INFO #2 chip::DeviceLayer::ConnectivityManagerImpl::StartWiFiManagement() ../../examples/tv-app/linux/third_party/connectedhomeip/src/platform/Linux/ConnectivityManagerImpl.cpp:675 (chip-tv-app+0x1b8e55) 2022-02-03 02:03:15.810 INFO #3 ChipLinuxAppInit(int, char**) ../../examples/tv-app/linux/third_party/connectedhomeip/examples/platform/linux/AppMain.cpp:182 (chip-tv-app+0x40fc2) 2022-02-03 02:03:15.810 INFO #4 main ../../examples/tv-app/linux/main.cpp:171 (chip-tv-app+0x3c3f5) 2022-02-03 02:03:15.810 INFO 2022-02-03 02:03:15.811 INFO Thread T2 (tid=31998, running) created by main thread at: 2022-02-03 02:03:15.811 INFO #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79) 2022-02-03 02:03:15.811 INFO #1 std::thread::_M_start_thread(std::unique_ptr >, void (*)()) (libstdc++.so.6+0xd70a8) 2022-02-03 02:03:15.811 INFO #2 chip::DeviceLayer::PlatformManagerImpl::_InitChipStack() (chip-tv-app+0x1c5b11) 2022-02-03 02:03:15.811 INFO #3 chip::DeviceLayer::PlatformManager::InitChipStack() ../../examples/tv-app/linux/third_party/connectedhomeip/src/include/platform/PlatformManager.h:329 (chip-tv-app+0x3d37f) 2022-02-03 02:03:15.812 INFO #4 ChipLinuxAppInit(int, char**) ../../examples/tv-app/linux/third_party/connectedhomeip/examples/platform/linux/AppMain.cpp:151 (chip-tv-app+0x40dee) 2022-02-03 02:03:15.812 INFO #5 main ../../examples/tv-app/linux/main.cpp:171 (chip-tv-app+0x3c3f5) 2022-02-03 02:03:15.812 INFO 2022-02-03 02:03:15.812 INFO SUMMARY: ThreadSanitizer: data race (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x70ac2) in g_slice_alloc0 2022-02-03 02:03:15.812 INFO ================== ```
andy31415 commented 2 years ago

V1 review: fixes in master for this are bug fixes and would be acceptable, however this is not a V1 blocker.

arkq commented 2 years ago

This might be related: https://gitlab.gnome.org/GNOME/glib/-/issues/1672

arkq commented 2 years ago

I've done some tests of this issue and it seems that on Ubuntu 22.04 TSAN has been fixed - it no longer reports such false positives.

One can use provided simple test code to check that with different TSAN versions. Compile the test code as follows: g++ -g -O0 test.cpp -fsanitize=thread -pthread $(pkg-config --cflags --libs glib-2.0)

test.cpp ```c++ #include #include #include struct data { int value; }; static struct data *sharedData1; static struct data *sharedData2; void glibMainLoopThread(GMainLoop *loop) { g_main_loop_run(loop); } gboolean glibCallback1(void * data) { auto *sharedData = reinterpret_cast(data); std::cout << "T: sharedData1 = " << sharedData->value << std::endl; sharedData->value++; return FALSE; } gboolean glibCallback2(void * data) { auto *sharedData = reinterpret_cast(data); std::cout << "T: sharedData2 = " << sharedData->value << std::endl; sharedData->value++; return FALSE; } int main() { sharedData1 = new struct data({10}); GMainLoop *loop = g_main_loop_new(NULL, FALSE); std::thread t(glibMainLoopThread, loop); sharedData2 = new struct data({20}); // That's not a data race // NOTE: TSAN on Ubuntu 20.04 reports it (false positive) std::cout << "sharedData1 = " << sharedData1->value << std::endl; std::cout << "sharedData2 = " << sharedData2->value << std::endl; g_idle_add(glibCallback1, sharedData1); g_idle_add(glibCallback2, sharedData2); // Here we've got a data race (callback is already added to main loop thread) // std::cout << "sharedData2 = " << sharedData2->value << std::endl; std::this_thread::sleep_for(std::chrono::seconds(1)); g_main_loop_quit(loop); t.join(); std::cout << "sharedData1 = " << sharedData1->value << std::endl; std::cout << "sharedData2 = " << sharedData2->value << std::endl; delete sharedData1; delete sharedData2; return 0; } ```

On Ubuntu 22.04 I've got no errors (unless commented std::cout line is included). However, on Ubuntu 20.04, I've got error mentioned in https://github.com/project-chip/connectedhomeip/issues/14710#issuecomment-1028551023.

Output from Ubuntu 22.04:

$ ./a.out
sharedData1 = 10
sharedData2 = 20
T: sharedData1 = 10
T: sharedData2 = 20
sharedData1 = 11
sharedData2 = 21

With our CI the problem occurs because as for now the ubuntu-latest == ubuntu-20.04. So, maybe we should change to use ubuntu-22.04 with workflows where TSAN is used?

woody-apple commented 2 years ago

Issue Scrub: @woody-apple to pin us to ubuntu-22.04 in CI. @andy31415 will help check to make sure warnings aren't being suppressed afterwards.

arkq commented 2 years ago

After more research I've found the TRUE cause of false positives. The proper answer is here: https://stackoverflow.com/a/74328089

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.