pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
944 stars 409 forks source link

Failed TSAN check in `RegionKVStoreOldTest.ReadIndex` lock-order-inversion (potential deadlock) #9091

Closed JaySon-Huang closed 4 months ago

JaySon-Huang commented 4 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from RegionKVStoreOldTest
[ RUN      ] RegionKVStoreOldTest.ReadIndex
[2024/05/27 16:52:42.700 +08:00] [INFO] [TiFlashTestEnv.cpp:262] ["Clean path /DATA/disk1/jaysonhuang/tiflash-master/cmake-build-tsan/tmp/region_kvs_old_test for bootstrap"] [source=Test] [thread_id=1]
[2024/05/27 16:52:42.700 +08:00] [WARN] [ProxyFFI.cpp:917] ["Set mock rust ptr gc function"] [source=MockSetRustGcHelper] [thread_id=1]
[2024/05/27 16:52:42.700 +08:00] [INFO] [KVStore.cpp:70] ["KVStore inited, eager_raft_log_gc_enabled=false"] [thread_id=1]
[2024/05/27 16:52:42.700 +08:00] [INFO] [PageStorageImpl.cpp:54] ["PageStorageImpl start. Config{ PageStorageConfig {blob_file_limit_size: 268435456, blob_spacemap_type: 2, blob_heavy_gc_valid_rate: 0.500, blob_heavy_gc_valid_rate_raft_data: 0.050, blob_block_alignment_bytes: 0, wal_roll_size: 2097152, wal_max_persisted_log_files: 4} }"] [source=RegionPersister]
 [thread_id=1]
[2024/05/27 16:52:42.701 +08:00] [INFO] [PageDirectoryFactory.cpp:68] ["PageDirectory restored, max_page_id=0 max_applied_ver=0"] [source=RegionPersister] [thread_id=1]
[2024/05/27 16:52:42.701 +08:00] [INFO] [RegionPersister.cpp:381] ["RegionPersister running. Current Run Mode is ONLY_V3"] [thread_id=1]
[2024/05/27 16:52:42.701 +08:00] [INFO] [KVStore.cpp:89] ["Restored 0 regions"] [thread_id=1]
[2024/05/27 16:52:42.702 +08:00] [INFO] [KVStore.cpp:136] ["Parsed proxy config snap_handle_pool_size 4"] [thread_id=1]
[2024/05/27 16:52:42.703 +08:00] [INFO] [KVStore.cpp:384] ["Set store info id: 1234"] [thread_id=1]
[2024/05/27 16:52:42.703 +08:00] [INFO] [kvstore_helper.h:102] ["Finished setup"] [source=RegionKVStoreOldTest] [thread_id=1]
[2024/05/27 16:52:42.704 +08:00] [INFO] [RegionTable.cpp:57] ["get new table, keyspace=4294967295 table_id=1"] [thread_id=1]
[2024/05/27 16:52:42.704 +08:00] [INFO] [ReadIndex.cpp:339] ["Start to initialize read-index workers: worker count 64, runner count 1"] [thread_id=1]
[2024/05/27 16:52:42.705 +08:00] [INFO] [RegionTable.cpp:57] ["get new table, keyspace=4294967295 table_id=2"] [thread_id=1]
[2024/05/27 16:52:42.705 +08:00] [INFO] [ReadIndexWorkerManager.cpp:161] ["Start read-index runner 0"] [source=ReadIndexWorkers] [thread_id=146]
[2024/05/27 16:52:42.706 +08:00] [INFO] [ReadIndex.cpp:143] ["start to check regions ready, min-wait-tick 0.001s, max-wait-tick 20s, wait-region-ready-timeout 1200.000s"] [source=WaitCheckRegionReady] [thread_id=147]
[2024/05/27 16:52:42.706 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=1 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=146]
[2024/05/27 16:52:42.706 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=2 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=146]
[2024/05/27 16:52:42.706 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=3 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=146]
[2024/05/27 16:52:42.706 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=9 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=146]                                                                                                                            [2024/05/27 16:52:42.706 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.001s"] [source=WaitCheckRegionReady] [thread_id=147]
[2024/05/27 16:52:42.707 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.002s"] [source=WaitCheckRegionReady] [thread_id=147]                                                                                                                                                                                                     [2024/05/27 16:52:42.710 +08:00] [INFO] [ReadIndex.cpp:289] ["finish to check 4 regions, time cost 0.004s"] [source=WaitCheckRegionReady] [thread_id=147]                                                                                                                                                                                                                   [2024/05/27 16:52:42.710 +08:00] [INFO] [KVStore.cpp:355] ["Handle destroy [region_id=9 applied_term=6 applied_index=66]"] [thread_id=1]
[2024/05/27 16:52:42.710 +08:00] [INFO] [KVStore.cpp:258] ["Start to remove region_id=9"] [thread_id=1]                                                                                                                                                                                                                                                                     ==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=3111683) 
 Cycle in lock order graph: M0 (0x7b4c00010500) => M1 (0x7b5400059100) => M2 (0x7b6800001410) => M0 

Mutex M1 acquired here while holding mutex M0 in thread T155:
    #0 pthread_mutex_lock /root/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (gtests_dbms+0x2f7375b)                                                                                                                                                                                                                                                #1 std::__1::mutex::lock() <null> (libc++.so.1+0x8f7c5)
    #2 DB::MockRaftStoreProxy::makeReadIndexTask(kvrpcpb::ReadIndexRequest) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Debug/MockKVStore/MockRaftStoreProxy.cpp:85:14 (gtests_dbms+0xe45643b)
    #3 DB::MockFFIImpls::fn_make_read_index_task(DB::RaftStoreProxyPtr, DB::BaseBuffView) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Debug/MockKVStore/MockReadIndex.cpp:46:21 (gtests_dbms+0xe46decc)
    #4 DB::TiFlashRaftProxyHelper::makeReadIndexTask(kvrpcpb::ReadIndexRequest const&) const /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/Proxy.cpp:106:26 (gtests_dbms+0xd02d6e6)
    #5 DB::makeReadIndexTask(DB::TiFlashRaftProxyHelper const&, kvrpcpb::ReadIndexRequest&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerImpl.h:124:23 (gtests_dbms+0xd043781)
    #6 DB::ReadIndexDataNode::runOneRound(DB::TiFlashRaftProxyHelper const&, std::__1::shared_ptr<DB::ReadIndexNotifyCtrl> const&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexDataNode.cpp:107:30 (gtests_dbms+0xd043781)
    #7 DB::ReadIndexWorker::consumeRegionNotifies(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorker.cpp:138:15 (gtests_dbms+0xd04fc92)
    #8 DB::ReadIndexWorker::runOneRound(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorker.cpp:162:9 (gtests_dbms+0xd050098)
    #9 DB::ReadIndexWorkerManager::ReadIndexRunner::runOneRound(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerManager.cpp:145:21 (gtests_dbms+0xd059390)
    #10 DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0::operator()() const /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerManager.cpp:166:13 (gtests_dbms+0xd059390)
    #11 decltype(std::declval<DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>()()) std::__1::__invoke[abi:ue170006]<DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>(DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0&&) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__type_traits/invoke.h:340:25 (gtests_dbm
s+0xd059390)
    #12 void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, DB::ReadIndexWorkerManager::
ReadIndexRunner::asyncRun()::$_0>&, std::__1::__tuple_indices<>) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__thread/thread.h:221:5 (gtests_dbms+0xd059390)
    #13 void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>>(void*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__thread/thread.h:232:5 (gtests_dbms+0xd059390)

    Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message

  Mutex M2 acquired here while holding mutex M1 in main thread:
    #0 pthread_mutex_lock /root/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (gtests_dbms+0x2f7375b)                                                                                                                                                                                                                                                #1 std::__1::mutex::lock() <null> (libc++.so.1+0x8f7c5)
    #2 DB::MockRaftStoreProxy::debugAddRegions(DB::KVStore&, DB::TMTContext&, std::__1::vector<unsigned long, std::__1::allocator<unsigned long>>, std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>&&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Debug/MockKVStore/MockRaftStoreProxy.cpp:171:26 (gtests_dbms+0xe457b4c)
    #3 DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/tests/gtest_kvstore.cpp:85:25 (gtests_dbms+0x61693d4)
    #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a1689)
    #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a1689)
    #6 testing::Test::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2517:5 (gtests_dbms+0xd076e23)
    #7 testing::TestInfo::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2693:11 (gtests_dbms+0xd07863e)
    #8 testing::TestCase::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2811:28 (gtests_dbms+0xd079623)
    #9 testing::internal::UnitTestImpl::RunAllTests() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:5177:43 (gtests_dbms+0xd08ba3a)
    #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a2b9d)
    #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a2b9d)
    #12 testing::UnitTest::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:4786:10 (gtests_dbms+0xd08b304)
    #13 RUN_ALL_TESTS() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/include/gtest/gtest.h:2341:46 (gtests_dbms+0x689d936)
    #14 main /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/TestUtils/gtests_dbms_main.cpp:116:16 (gtests_dbms+0x689d936)

  Mutex M0 acquired here while holding mutex M2 in main thread:
    #0 pthread_mutex_lock /root/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (gtests_dbms+0x2f7375b)
    #1 std::__1::mutex::lock() <null> (libc++.so.1+0x8f7c5)
    #2 DB::ReadIndexDataNode::~ReadIndexDataNode() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexDataNode.cpp:290:14 (gtests_dbms+0xd04520b)
    #3 void std::__1::__destroy_at[abi:ue170006]<DB::ReadIndexDataNode, 0>(DB::ReadIndexDataNode*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/construct_at.h:69:13 (gtests_dbms+0xd05121d)
    #4 void std::__1::destroy_at[abi:ue170006]<DB::ReadIndexDataNode, 0>(DB::ReadIndexDataNode*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/construct_at.h:104:5 (gtests_dbms+0xd05121d)
    #5 void std::__1::allocator_traits<std::__1::allocator<DB::ReadIndexDataNode>>::destroy[abi:ue170006]<DB::ReadIndexDataNode, void, void>(std::__1::allocator<DB::ReadIndexDataNode>&, DB::ReadIndexDataNode*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/allocator_traits.h:323:9 (gtests_dbms+0xd05121d)
    #6 std::__1::__shared_ptr_emplace<DB::ReadIndexDataNode, std::__1::allocator<DB::ReadIndexDataNode>>::__on_zero_shared() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/shared_ptr.h:320:13 (gtests_dbms+0xd05121d)
    #7 std::__1::__shared_count::__release_shared[abi:ue170006]() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/shared_ptr.h:172:9 (gtests_dbms+0xd05145d)
    #8 std::__1::__shared_weak_count::__release_shared[abi:ue170006]() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/shared_ptr.h:213:27 (gtests_dbms+0xd05145d)
    #9 std::__1::shared_ptr<DB::ReadIndexDataNode>::~shared_ptr[abi:ue170006]() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/shared_ptr.h:772:23 (gtests_dbms+0xd05145d)
    #10 std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>::~pair() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__utility/pair.h:81:29 (gtests_dbms+0xd05145d)
    #11 void std::__1::__destroy_at[abi:ue170006]<std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>, 0>(std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/construct_at.h:69:13 (gtests_dbms+0xd05145d)
    #12 void std::__1::destroy_at[abi:ue170006]<std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>, 0>(std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/construct_at.h:104:5 (gtests_dbms+0xd05145d)
    #13 void std::__1::allocator_traits<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>>::destroy[abi:ue170006]<std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void, void>(std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_ty
pe<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>&, std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/allocator_traits.h:323:9 (gtests_dbms+0xd05145d)
    #14 std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>>::operator()[abi:ue170006](std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>*) /DATA/disk1/ra_common/tiflash-env-17/
sysroot/bin/../include/c++/v1/__hash_table:674:13 (gtests_dbms+0xd05145d)
    #15 std::__1::unique_ptr<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>, std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>>>::reset[abi:ue170006](std::__1::__hash_node
<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/unique_ptr.h:300:7 (gtests_dbms+0xd05145d)
    #16 std::__1::unique_ptr<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>, std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>>>::~unique_ptr[abi:ue170006]() /DATA/disk1/r
a_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/unique_ptr.h:266:75 (gtests_dbms+0xd05145d)
    #17 std::__1::__hash_table<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::__unordered_map_hasher<unsigned long, std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::hash<unsigned long>, std::__1::equal_to<unsigned long>, true>, std::__1::__unordered_map_equal<un
signed long, std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::equal_to<unsigned long>, std::__1::hash<unsigned long>, true>, std::__1::allocator<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>>>::erase(std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_v
alue_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>*>) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__hash_table:2230:5 (gtests_dbms+0xd05145d)
    #18 unsigned long std::__1::__hash_table<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::__unordered_map_hasher<unsigned long, std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::hash<unsigned long>, std::__1::equal_to<unsigned long>, true>, std::__1::__unordere
d_map_equal<unsigned long, std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::equal_to<unsigned long>, std::__1::hash<unsigned long>, true>, std::__1::allocator<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>>>::__erase_unique<unsigned long>(unsigned long const&) /DATA/disk1/
ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__hash_table:2256:5 (gtests_dbms+0xd05145d)
    #19 std::__1::unordered_map<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>, std::__1::hash<unsigned long>, std::__1::equal_to<unsigned long>, std::__1::allocator<std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>>>::erase[abi:ue170006](unsigned long const&) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/
c++/v1/unordered_map:1449:59 (gtests_dbms+0xd0503b4)
    #20 DB::ReadIndexWorker::DataMap::removeRegion(unsigned long) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorker.cpp:114:16 (gtests_dbms+0xd0503b4)
    #21 DB::ReadIndexWorker::removeRegion(unsigned long) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorker.cpp:190:14 (gtests_dbms+0xd0503b4)
    #22 DB::KVStore::removeRegion(unsigned long, bool, DB::RegionTable&, DB::KVStoreTaskLock const&, DB::RegionTaskLock const&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/KVStore.cpp:272:69 (gtests_dbms+0xce5a36f)
    #23 DB::KVStore::handleDestroy(unsigned long, DB::TMTContext&, DB::KVStoreTaskLock const&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/KVStore.cpp:357:5 (gtests_dbms+0xce5b2b1)
    #24 DB::KVStore::handleDestroy(unsigned long, DB::TMTContext&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/KVStore.cpp:336:5 (gtests_dbms+0xce5abde)
    #25 DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody()::$_4::operator()() const /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/tests/gtest_kvstore.cpp:138:13 (gtests_dbms+0x616b05f)
    #26 ext::ScopeGuard<DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody()::$_4>::~ScopeGuard() /DATA/disk1/jaysonhuang/tiflash-master/libs/libcommon/include/ext/scope_guard.h:33:21 (gtests_dbms+0x616b05f)
    #27 DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/tests/gtest_kvstore.cpp:146:9 (gtests_dbms+0x616b05f)
    #28 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a1689)
    #29 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a1689)
    #30 testing::Test::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2517:5 (gtests_dbms+0xd076e23)
    #31 testing::TestInfo::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2693:11 (gtests_dbms+0xd07863e)
    #32 testing::TestCase::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2811:28 (gtests_dbms+0xd079623)
    #33 testing::internal::UnitTestImpl::RunAllTests() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:5177:43 (gtests_dbms+0xd08ba3a)
    #34 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a2b9d)
    #35 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a2b9d)
    #36 testing::UnitTest::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:4786:10 (gtests_dbms+0xd08b304)
    #37 RUN_ALL_TESTS() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/include/gtest/gtest.h:2341:46 (gtests_dbms+0x689d936)
    #38 main /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/TestUtils/gtests_dbms_main.cpp:116:16 (gtests_dbms+0x689d936)

  Thread T155 'ReadIndexWkr-0' (tid=3111840, running) created by main thread at:
    #0 pthread_create /root/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1020:3 (gtests_dbms+0x2f71a5b)
    #1 std::__1::__libcpp_thread_create[abi:ue170006](unsigned long*, void* (*)(void*), void*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__threading_support:371:10 (gtests_dbms+0xd0576f4)
    #2 std::__1::thread::thread<DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0, void>(DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0&&) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__thread/thread.h:248:16 (gtests_dbms+0xd0576f4)
    #3 std::__1::__unique_if<std::__1::thread>::__unique_single std::__1::make_unique[abi:ue170006]<std::__1::thread, DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>(DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0&&) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/unique_ptr.h:689:30 (gtests_dbms+0xd0576f4)
    #4 DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerManager.cpp:151:19 (gtests_dbms+0xd0576f4)
    #5 DB::ReadIndexWorkerManager::asyncRun() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerManager.cpp:70:17 (gtests_dbms+0xd05763f)
    #6 DB::KVStore::asyncRunReadIndexWorkers() const /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndex.cpp:357:32 (gtests_dbms+0xd039648)
    #7 DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/tests/gtest_kvstore.cpp:118:17 (gtests_dbms+0x616a16f)
    #8 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a1689)
    #9 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a1689)
    #10 testing::Test::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2517:5 (gtests_dbms+0xd076e23)
    #11 testing::TestInfo::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2693:11 (gtests_dbms+0xd07863e)
    #12 testing::TestCase::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2811:28 (gtests_dbms+0xd079623)
    #13 testing::internal::UnitTestImpl::RunAllTests() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:5177:43 (gtests_dbms+0xd08ba3a)
    #14 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a2b9d)
    #15 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a2b9d)
    #16 testing::UnitTest::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:4786:10 (gtests_dbms+0xd08b304)
    #17 RUN_ALL_TESTS() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/include/gtest/gtest.h:2341:46 (gtests_dbms+0x689d936)
    #18 main /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/TestUtils/gtests_dbms_main.cpp:116:16 (gtests_dbms+0x689d936)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/DATA/disk1/ra_common/tiflash-env-17/sysroot/lib/x86_64-unknown-linux-gnu/libc++.so.1+0x8f7c5) in std::__1::mutex::lock()
==================
[2024/05/27 16:52:43.603 +08:00] [INFO] [WALStore.cpp:143] ["Creating log file for writing [fullname=/DATA/disk1/jaysonhuang/tiflash-master/cmake-build-tsan/tmp/region_kvs_old_test/page/kvstore/wal/log_1_0]"] [source=RegionPersister] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [KVStore.cpp:281] ["Persisted region_id=9 deleted"] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [RegionTable.cpp:237] ["remove region in RegionTable done, region_id=9"] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [RegionTable.cpp:249] ["remove region in storage done, region_id=9"] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [KVStore.cpp:285] ["Remove region_id=9 done"] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [ReadIndexWorkerManager.cpp:171] ["Start to stop read-index runner 0"] [source=ReadIndexWorkers] [thread_id=146]
[2024/05/27 16:52:43.605 +08:00] [INFO] [ReadIndexWorkerManager.cpp:132] ["Thread of read-index runner 0 has joined"] [source=ReadIndexWorkers] [thread_id=1]
[2024/05/27 16:52:43.605 +08:00] [INFO] [ReadIndexWorkerManager.cpp:161] ["Start read-index runner 0"] [source=ReadIndexWorkers] [thread_id=148]
[2024/05/27 16:52:43.605 +08:00] [INFO] [ReadIndex.cpp:143] ["start to check regions ready, min-wait-tick 0.001s, max-wait-tick 0.002s, wait-region-ready-timeout 0.005s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.605 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=1 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=148]
[2024/05/27 16:52:43.606 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=2 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=148]
[2024/05/27 16:52:43.606 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=3 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=148]
[2024/05/27 16:52:43.606 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.001s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.607 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.002s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.609 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.002s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.611 +08:00] [WARN] [ReadIndex.cpp:282] ["1 regions CANNOT catch up with latest index, (region-id,latest-index,apply-index): 2,6,5"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.611 +08:00] [INFO] [ReadIndex.cpp:289] ["finish to check 3 regions, time cost 0.006s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.611 +08:00] [INFO] [ReadIndexWorkerManager.cpp:171] ["Start to stop read-index runner 0"] [source=ReadIndexWorkers] [thread_id=148]
[2024/05/27 16:52:43.611 +08:00] [INFO] [ReadIndexWorkerManager.cpp:132] ["Thread of read-index runner 0 has joined"] [source=ReadIndexWorkers] [thread_id=1]
[2024/05/27 16:52:43.612 +08:00] [INFO] [ReadIndexWorkerManager.cpp:161] ["Start read-index runner 0"] [source=ReadIndexWorkers] [thread_id=150]
[2024/05/27 16:52:43.622 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=8 region_id=1 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=150]
[2024/05/27 16:52:43.633 +08:00] [TRACE] [ReadIndexDataNode.cpp:93] ["[Learner Read] Read Index in Batch(use histroy), max_ts=8 region_id=1 waiting_tasks=1 running_tasks=0 histroy_ts=8"] [thread_id=150]
[2024/05/27 16:52:43.643 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=10 region_id=1 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=150]
[2024/05/27 16:52:43.653 +08:00] [TRACE] [ReadIndexDataNode.cpp:93] ["[Learner Read] Read Index in Batch(use histroy), max_ts=8 region_id=1 waiting_tasks=1 running_tasks=0 histroy_ts=10"] [thread_id=150]
[2024/05/27 16:52:43.655 +08:00] [WARN] [ReadIndex.cpp:121] ["[region_id=2] wait learner index timeout, prev_index=5 curr_index=5 to_wait=668 state=0 elapsed_s=0.002 timeout_s=0.002"] [thread_id=1]
[2024/05/27 16:52:43.658 +08:00] [DEBUG] [ReadIndex.cpp:101] ["[region_id=2] wait learner index done, prev_index=5 curr_index=668 to_wait=668 elapsed_s=0.002 timeout_s=100.000"] [thread_id=151]
[2024/05/27 16:52:43.658 +08:00] [INFO] [ReadIndexWorkerManager.cpp:171] ["Start to stop read-index runner 0"] [source=ReadIndexWorkers] [thread_id=150]
[2024/05/27 16:52:43.658 +08:00] [INFO] [ReadIndexWorkerManager.cpp:132] ["Thread of read-index runner 0 has joined"] [source=ReadIndexWorkers] [thread_id=1]
[2024/05/27 16:52:43.658 +08:00] [INFO] [ReadIndex.cpp:371] ["KVStore shutdown, deleting read index worker"] [thread_id=1]
[2024/05/27 16:52:43.659 +08:00] [INFO] [KVStore.cpp:432] ["Destroy KVStore"] [thread_id=1]
[2024/05/27 16:52:43.659 +08:00] [INFO] [ReadIndex.cpp:371] ["KVStore shutdown, deleting read index worker"] [thread_id=1]
[2024/05/27 16:52:43.659 +08:00] [INFO] [KVStore.cpp:434] ["Destroy KVStore Finished"] [thread_id=1]
[       OK ] RegionKVStoreOldTest.ReadIndex (960 ms)
[----------] 1 test from RegionKVStoreOldTest (960 ms total)

4. What is your TiFlash version? (Required)

CalvinNeo commented 4 months ago

M1: MockRaftStoreProxy's lock guard M2: KVStore::taskLock M0: ReadIndexDataNode's lock guard

M1: MockRaftStoreProxy's lock guard
M2:KVStore::taskLock
M0:ReadIndexDataNode's lock guard

Mutex M1 acquired here while holding mutex M0 in thread T155:
M0: From `ReadIndexDataNode::runOneRound`
M1: In case the regions mutates

Mutex M2 acquired here while holding mutex M1 in main thread:
M1: To mutate regions in mock proxy
M2: To mutate regions in kvstore

Mutex M0 acquired here while holding mutex M2 in main thread:
M2: From KVStore::handleDestroy
M0: To release