pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
941 stars 410 forks source link

The tiflash process keeps stuck when query with heavy write #2404

Open lilinghai opened 3 years ago

lilinghai commented 3 years ago

v5.0.3 thread apply all bt 218815.log

JaySon-Huang commented 3 years ago

There is a chance that acquiring read lock will be blocked by another read lock for current implementation on RWLock. I guess some threads holding a read lock on the table lock for a long time. And it blocks others threads from acquiring read lock under some situation.

Background:

'.' for waiting
'-' for holding the lock

            drop_lock
    q1: |--r1---------|
    q2:   |..w1..|
                 ^-(timeout and released)
    q3:       |..r2....---------|
    q4:            |...-r3--|
                      ^-(r2,r3 will be blocked until r1 released)

Or check out the test case here: https://github.com/JaySon-Huang/tics/commit/4b76143a0b2cb0f3c566bf7bae5fa47e106a631a#diff-821b263f17d8d71ab0412e967ddb61dc9480c20db316cc7714b2affe1d5c4c74R376-R453

JaySon-Huang commented 3 years ago

TL;DR, the thread of removeObsoleteDataInStorage and shouldOptimizeTable just like "q3" and "q4" on the previous comment. Both of them are waiting on another read group("q1") to release the read lock. But I can not print out which thread is holding that read lock.

# Try to acquire read lock on "drop_lock"(0x7f832e69a080), without timeout
# not sure which thread hold a read lock on "drop_lock"
Thread 231 (Thread 0x7f833d7f5700 (LWP 218899)):
...
#3  0x00000000079c2e33 in wait<DB::RWLock::getLock(DB::RWLock::Type, const String&, const milliseconds&)::<lambda()> > (__p=..., __lock=..., this=0x7f8352e749c0) at /usr/local/include/c++/7.4.0/condition_variable:99
#4  DB::RWLock::getLock (this=0x7f832e69a080, type=type@entry=DB::RWLock::Read, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:170
#5  0x00000000070a59f4 in DB::IStorage::tryLockTimed (this=this@entry=0x7f832e631010, rwlock=..., type=type@entry=DB::RWLock::Read, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:17
#6  0x00000000070a5fcb in DB::IStorage::lockStructureForShare (this=0x7f832e631010, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:58
#7  0x000000000732c0b7 in DB::shouldOptimizeTable (table_id=table_id@entry=137, tmt=..., log=0x7f83815734e0, threshold=threshold@entry=0.59999999999999998) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTableCheckOptimize.cpp:106
#8  0x000000000732d169 in DB::RegionTable::checkTableOptimize (this=this@entry=0x7f83816e8088, table_id=137, threshold=threshold@entry=0.59999999999999998) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTableCheckOptimize.cpp:173
#9  0x000000000732dd30 in DB::RegionTable::checkTableOptimize (this=0x7f83816e8088) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTableCheckOptimize.cpp:158
#10 0x00000000072c987f in operator() (__closure=0x7f8352f40818) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/BackgroundService.cpp:21
#11 std::_Function_handler<bool(), DB::BackgroundService::BackgroundService(DB::TMTContext&)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/local/include/c++/7.4.0/bits/std_function.h:302
#12 0x0000000007200fa7 in operator() (this=<optimized out>) at /usr/local/include/c++/7.4.0/bits/std_function.h:706
#13 DB::BackgroundProcessingPool::threadFunction (this=0x7f8343c8fa50) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/MergeTree/BackgroundProcessingPool.cpp:187
...

# Applying a raft admin cmd to remove region from TiFlash node
# Try to acquire read lock on "drop_lock"(0x7f832e69a080), without timeout
# not sure which thread hold a read lock on "drop_lock"
Thread 61 (Thread 0x7f829fb7f700 (LWP 219161)):
...
#3  0x00000000079c2e33 in wait<DB::RWLock::getLock(DB::RWLock::Type, const String&, const milliseconds&)::<lambda()> > (__p=..., __lock=..., this=0x7f8352e749c0) at /usr/local/include/c++/7.4.0/condition_variable:99
#4  DB::RWLock::getLock (this=0x7f832e69a080, type=type@entry=DB::RWLock::Read, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:170
#5  0x00000000070a59f4 in DB::IStorage::tryLockTimed (this=this@entry=0x7f832e631010, rwlock=..., type=type@entry=DB::RWLock::Read, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:17
#6  0x00000000070a5d59 in DB::IStorage::lockForShare (this=this@entry=0x7f832e631010, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:32
#7  0x000000000732a3a9 in removeObsoleteDataInStorage (handle_range=..., table_id=137, context=0x7f8381483000) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTable.cpp:242
#8  DB::RegionTable::removeRegion (this=this@entry=0x7f83816e8088, region_id=region_id@entry=2464367, remove_data=remove_data@entry=true) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTable.cpp:305
#9  0x00000000072e2e99 in DB::KVStore::removeRegion (this=this@entry=0x7f83819c5a10, region_id=region_id@entry=2464367, remove_data=remove_data@entry=true, region_table=..., task_lock=..., region_lock=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:177
#10 0x00000000072e55c7 in operator() (__closure=<synthetic pointer>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:451
#11 DB::KVStore::handleAdminRaftCmd(raft_cmdpb::AdminRequest&&, raft_cmdpb::AdminResponse&&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) (this=0x7f83819c5a10, request=request@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe3482e4>, response=response@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe3482f5>, curr_region_id=curr_region_id@entry=2464367, index=index@entry=9, term=term@entry=6, tmt=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:497
#12 0x00000000072f0988 in DB::HandleAdminRaftCmd (server=0x7ffe939ea620, req_buff=..., resp_buff=..., header=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/ProxyFFI.cpp:85
...

# Background physically drop table
# Try to acquire write lock on "alter_lock", with timeout, not sure which thread hold a read lock on "alter_lock"
Thread 236 (Thread 0x7f83409fa700 (LWP 218894)):
...
#5  DB::RWLock::getLock (this=0x7f832e64e000, type=type@entry=DB::RWLock::Write, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:176
#6  0x00000000070a59f4 in DB::IStorage::tryLockTimed (this=this@entry=0x7f832e631010, rwlock=..., type=type@entry=DB::RWLock::Write, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:17
#7  0x00000000070a6323 in DB::IStorage::lockExclusively (this=this@entry=0x7f832e631010, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:66
#8  0x00000000075b4233 in DB::InterpreterDropQuery::execute (this=this@entry=0x7f83409d03a0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/InterpreterDropQuery.cpp:133
#9  0x00000000073423ec in DB::SchemaSyncService::gc (this=0x7f8354ef68f0, gc_safe_point=426312505650315264) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/SchemaSyncService.cpp:132
#10 0x000000000734465d in _ZZN2DB17SchemaSyncServiceC4ERNS_7ContextEENKUlvE_clEv () at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/SchemaSyncService.cpp:40
#11 0x0000000007200fa7 in operator() (this=<optimized out>) at /usr/local/include/c++/7.4.0/bits/std_function.h:706
#12 DB::BackgroundProcessingPool::threadFunction (this=0x7f8343c8fa50) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/MergeTree/BackgroundProcessingPool.cpp:187
...

# Thread 10 is stuck by wait index. The `DAGQueryBlock` being executing has children (there are a frame with InterpreterDAG.cpp:48).
# No sure whether it holds table lock or not.
Thread 10 (Thread 0x7f81cbdff700 (LWP 294957)):
...
#3  0x000000000731abf2 in wait<DB::RegionMeta::waitIndex(DB::UInt64, const atomic_bool&) const::<lambda()> > (__p=..., __lock=..., this=0x7f82a74a94c8) at /usr/local/include/c++/7.4.0/condition_variable:99
#4  DB::RegionMeta::waitIndex (this=this@entry=0x7f82a74a93f8, index=index@entry=13, terminated=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionMeta.cpp:151
#5  0x00000000072f21fc in DB::Region::waitIndex (this=this@entry=0x7f82a74a9310, index=13, terminated=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/Region.cpp:493
#6  0x00000000076e76f3 in DB::<lambda(size_t)>::operator()(size_t) const (__closure=__closure@entry=0x7f81cbdcf8f0, region_begin_idx=region_begin_idx@entry=0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/LearnerRead.cpp:199
#7  0x00000000076e9585 in DB::doLearnerRead (table_id=<optimized out>, mvcc_query_info=..., num_streams=<optimized out>, tmt=..., log=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/LearnerRead.cpp:243
#8  0x00000000078cac50 in DB::DAGQueryBlockInterpreter::executeTS (this=this@entry=0x7f81cbdd22b0, ts=..., pipeline=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:221
#9  0x00000000078cdbbd in DB::DAGQueryBlockInterpreter::executeImpl (this=this@entry=0x7f81cbdd22b0, pipeline=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:1348
#10 0x00000000078cf07d in DB::DAGQueryBlockInterpreter::execute (this=this@entry=0x7f81cbdd22b0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:1436
#11 0x000000000788811d in DB::InterpreterDAG::executeQueryBlock (this=this@entry=0x7f8209a24380, query_block=..., subqueriesForSets=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp:53
#12 0x0000000007887f36 in DB::InterpreterDAG::executeQueryBlock (this=this@entry=0x7f8209a24380, query_block=..., subqueriesForSets=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp:48
#13 0x0000000007888ba5 in DB::InterpreterDAG::execute (this=0x7f8209a24380) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp:80
#14 0x0000000007017451 in DB::executeQueryImpl (query_src=..., context=..., internal=internal@entry=false, stage=stage@entry=DB::QueryProcessingStage::Complete) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/executeQuery.cpp:205
#15 0x00000000070186e4 in DB::executeQuery (dag=..., context=..., internal=internal@entry=false, stage=stage@entry=DB::QueryProcessingStage::Complete) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/executeQuery.cpp:398
#16 0x00000000078a0ffe in DB::MPPTask::prepare (this=0x7f822bd51c10, task_request=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Mpp/MPPHandler.cpp:184
#17 0x00000000078a4241 in DB::MPPHandler::execute (this=this@entry=0x7f81cbdd4270, context=..., response=0x7f81cbdd5450) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Mpp/MPPHandler.cpp:368
#18 0x00000000078570db in DB::FlashService::DispatchMPPTask (this=0x7f8381e207a0, grpc_context=0x7f8209a88020, request=0x7f8209a0d3b0, response=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/FlashService.cpp:140
...

# Not sure whether it holds table lock or not
Thread 44 (Thread 0x7f7e5a2fe700 (LWP 222685)):
...
#3  0x0000000007857733 in wait<DB::MPPTunnel::waitForFinish()::<lambda()> > (__p=..., __lock=..., this=0x7f81ecec87c8) at /usr/local/include/c++/7.4.0/condition_variable:99
#4  waitForFinish (this=0x7f81ecec8770) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Mpp/MPPHandler.h:168
#5  DB::FlashService::EstablishMPPConnection (this=0x7f8381e207a0, grpc_context=<optimized out>, request=<optimized out>, writer=0x7f7e5a2d45c0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/FlashService.cpp:200
#6  0x00000000080b5ec9 in __invoke_impl<grpc::Status, grpc::Status (tikvpb::Tikv::Service::* const&)(grpc_impl::ServerContext*, mpp::EstablishMPPConnectionRequest const*, grpc_impl::ServerWriter<mpp::MPPDataPacket>*), tikvpb::Tikv::Service*, grpc_impl::ServerContext*, mpp::EstablishMPPConnectionRequest const*, grpc_impl::ServerWriter<mpp::MPPDataPacket>*> (__t=<optimized out>, __f=<optimized out>) at /usr/local/include/c++/7.4.0/bits/invoke.h:73
#7  __invoke<grpc::Status (tikvpb::Tikv::Service::* const&)(grpc_impl::ServerContext*, mpp::EstablishMPPConnectionRequest const*, grpc_impl::ServerWriter<mpp::MPPDataPacket>*), tikvpb::Tikv::Service*, grpc_impl::ServerContext*, mpp::EstablishMPPConnectionRequest const*, grpc_impl::ServerWriter<mpp::MPPDataPacket>*> (__fn=<optimized out>) at /usr/local/include/c++/7.4.0/bits/invoke.h:96
#8  operator()<tikvpb::Tikv::Service*, grpc_impl::ServerContext*, const mpp::EstablishMPPConnectionRequest*, grpc_impl::ServerWriter<mpp::MPPDataPacket>*> (this=<optimized out>) at /usr/local/include/c++/7.4.0/functional:175
#9  std::_Function_handler<grpc::Status (tikvpb::Tikv::Service*, grpc_impl::ServerContext*, mpp::EstablishMPPConnectionRequest const*, grpc_impl::ServerWriter<mpp::MPPDataPacket>*), std::_Mem_fn<grpc::Status (tikvpb::Tikv::Service::*)(grpc_impl::ServerContext*, mpp::EstablishMPPConnectionRequest const*, grpc_impl::ServerWriter<mpp::MPPDataPacket>*)> >::_M_invoke(std::_Any_data const&, tikvpb::Tikv::Service*&&, grpc_impl::ServerContext*&&, mpp::EstablishMPPConnectionRequest const*&&, grpc_impl::ServerWriter<mpp::MPPDataPacket>*&&) (__functor=..., __args#0=<optimized out>, __args#1=<optimized out>, __args#2=<optimized out>, __args#3=<optimized out>) at /usr/local/include/c++/7.4.0/bits/std_function.h:302
...

# Thread 3,4,5,7,8,9 share a similar backtrace, they are stuck by wait index
# because all raft-applying thread are stuck
# Seems not holding any table lock
Thread 9 (Thread 0x7f81c97fc700 (LWP 294958)):
...
#3  0x000000000731abf2 in wait<DB::RegionMeta::waitIndex(DB::UInt64, const atomic_bool&) const::<lambda()> > (__p=..., __lock=..., this=0x7f829f2413c8) at /usr/local/include/c++/7.4.0/condition_variable:99
#4  DB::RegionMeta::waitIndex (this=this@entry=0x7f829f2412f8, index=index@entry=8626, terminated=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionMeta.cpp:151
#5  0x00000000072f21fc in DB::Region::waitIndex (this=this@entry=0x7f829f241210, index=8626, terminated=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/Region.cpp:493
#6  0x00000000076e76f3 in DB::<lambda(size_t)>::operator()(size_t) const (__closure=__closure@entry=0x7f81c97cca80, region_begin_idx=region_begin_idx@entry=0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/LearnerRead.cpp:199
#7  0x00000000076e9585 in DB::doLearnerRead (table_id=<optimized out>, mvcc_query_info=..., num_streams=<optimized out>, tmt=..., log=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/LearnerRead.cpp:243
#8  0x00000000078cac50 in DB::DAGQueryBlockInterpreter::executeTS (this=this@entry=0x7f81c97cf440, ts=..., pipeline=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:221
#9  0x00000000078cdbbd in DB::DAGQueryBlockInterpreter::executeImpl (this=this@entry=0x7f81c97cf440, pipeline=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:1348
#10 0x00000000078cf07d in DB::DAGQueryBlockInterpreter::execute (this=this@entry=0x7f81c97cf440) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:1436
#11 0x000000000788811d in DB::InterpreterDAG::executeQueryBlock (this=this@entry=0x7f81f0257150, query_block=..., subqueriesForSets=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp:53
#12 0x0000000007888ba5 in DB::InterpreterDAG::execute (this=0x7f81f0257150) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp:80
#13 0x0000000007017451 in DB::executeQueryImpl (query_src=..., context=..., internal=internal@entry=false, stage=stage@entry=DB::QueryProcessingStage::Complete) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/executeQuery.cpp:205
#14 0x00000000070186e4 in DB::executeQuery (dag=..., context=..., internal=internal@entry=false, stage=stage@entry=DB::QueryProcessingStage::Complete) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/executeQuery.cpp:398
#15 0x00000000078a0ffe in DB::MPPTask::prepare (this=0x7f81f02bd010, task_request=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Mpp/MPPHandler.cpp:184
#16 0x00000000078a4241 in DB::MPPHandler::execute (this=this@entry=0x7f81c97d1270, context=..., response=0x7f81c97d2450) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Mpp/MPPHandler.cpp:368
#17 0x00000000078570db in DB::FlashService::DispatchMPPTask (this=0x7f8381e207a0, grpc_context=0x7f81f0247020, request=0x7f81f02053b0, response=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/FlashService.cpp:140
...

# Apply snapshot get stuck by `KVStoreTaskLock`, which is being held by thread 61
Thread 70 (Thread 0x7f82a7ffe700 (LWP 219152)):
...
#6  KVStoreTaskLock (mutex_=..., this=0x7f82a7fd2ed8) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.h:156
#7  DB::KVStore::genTaskLock (this=this@entry=0x7f83819c5a10) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:182
#8  0x00000000072decbd in DB::KVStore::handleRegionsByRangeOverlap(std::pair<DB::TiKVRangeKey, DB::TiKVRangeKey> const&, std::function<void (std::unordered_map<unsigned long, std::shared_ptr<DB::Region>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<DB::Region> > > >, DB::KVStoreTaskLock const&)>&&) const (this=this@entry=0x7f83819c5a10, range=..., callback=callback@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe368b19>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:61
#9  0x00000000076dc382 in DB::KVStore::checkAndApplySnapshot (this=this@entry=0x7f83819c5a10, new_region=..., tmt=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/ApplySnapshot.cpp:75
#10 0x00000000076dd523 in DB::KVStore::handlePreApplySnapshot (this=0x7f83819c5a10, new_region=..., tmt=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/ApplySnapshot.cpp:299
#11 0x00000000072eea30 in DB::ApplyPreHandledSnapshot (server=0x7ffe939ea620, snap=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/ProxyFFI.cpp:254
#12 0x00000000072eeaee in DB::ApplyPreHandledSnapshot (server=<optimized out>, res=<optimized out>, type=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/ProxyFFI.cpp:270
...

# Apply snapshot get stuck by `KVStoreTaskLock`, which is being held by thread 61
Thread 60 (Thread 0x7f829f97e700 (LWP 219162)):
...
#6  KVStoreTaskLock (mutex_=..., this=0x7f829f951878) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.h:156
#7  DB::KVStore::genTaskLock (this=this@entry=0x7f83819c5a10) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:182
#8  0x00000000072e4d83 in DB::KVStore::handleAdminRaftCmd(raft_cmdpb::AdminRequest&&, raft_cmdpb::AdminResponse&&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) (this=0x7f83819c5a10, request=request@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe3482e4>, response=response@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe3482f5>, curr_region_id=curr_region_id@entry=2456452, index=index@entry=10, term=term@entry=7, tmt=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:353
#9  0x00000000072f0988 in DB::HandleAdminRaftCmd (server=0x7ffe939ea620, req_buff=..., resp_buff=..., header=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/ProxyFFI.cpp:85
...

# Thread 59, 58, Apply admin raft cmd get stuck by `KVStoreTaskLock`, which is being held by thread 61
Thread 59 (Thread 0x7f829f1ff700 (LWP 219163)):
...
#6  KVStoreTaskLock (mutex_=..., this=0x7f829f1d2878) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.h:156
#7  DB::KVStore::genTaskLock (this=this@entry=0x7f83819c5a10) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:182
#8  0x00000000072e4d83 in DB::KVStore::handleAdminRaftCmd(raft_cmdpb::AdminRequest&&, raft_cmdpb::AdminResponse&&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) (this=0x7f83819c5a10, request=request@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe3482e4>, response=response@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe3482f5>, curr_region_id=curr_region_id@entry=218960, index=index@entry=24, term=term@entry=8, tmt=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:353
#9  0x00000000072f0988 in DB::HandleAdminRaftCmd (server=0x7ffe939ea620, req_buff=..., resp_buff=..., header=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/ProxyFFI.cpp:85
...

Details on why the thread of removeObsoleteDataInStorage and shouldOptimizeTable are in the same read group:

# checkout the thread of `removeObsoleteDataInStorage`, it is waiting on a read lock
# on "drop_lock"
# The reader group (it_group => 0x7f8352e749a0)
# It is waiting on current reader group (rdlock_owner => 0x7f7e5520fe20) to release the read lock
(gdb) thread xx
[Switching to thread xx (Thread 0x7f829fb7f700 (LWP 219161))]
...
#3  0x00000000079c2e33 in wait<DB::RWLock::getLock(DB::RWLock::Type, const String&, const milliseconds&)::<lambda()> > (__p=..., __lock=..., this=0x7f8352e749c0) at /usr/local/include/c++/7.4.0/condition_variable:99
#4  DB::RWLock::getLock (this=0x7f832e69a080, type=type@entry=DB::RWLock::Read, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:170
#5  0x00000000070a59f4 in DB::IStorage::tryLockTimed (this=this@entry=0x7f832e631010, rwlock=..., type=type@entry=DB::RWLock::Read, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:17
#6  0x00000000070a5d59 in DB::IStorage::lockForShare (this=this@entry=0x7f832e631010, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:32
#7  0x000000000732a3a9 in removeObsoleteDataInStorage (handle_range=..., table_id=137, context=0x7f8381483000) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTable.cpp:242
#8  DB::RegionTable::removeRegion (this=this@entry=0x7f83816e8088, region_id=region_id@entry=2464367, remove_data=remove_data@entry=true) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTable.cpp:305
#9  0x00000000072e2e99 in DB::KVStore::removeRegion (this=this@entry=0x7f83819c5a10, region_id=region_id@entry=2464367, remove_data=remove_data@entry=true, region_table=..., task_lock=..., region_lock=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:177
#10 0x00000000072e55c7 in operator() (__closure=<synthetic pointer>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:451
#11 DB::KVStore::handleAdminRaftCmd(raft_cmdpb::AdminRequest&&, raft_cmdpb::AdminResponse&&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) (this=0x7f83819c5a10, request=request@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe3482e4>,
    response=response@entry=<unknown type in /data1/tidb-deploy-upgrade0/tiflash-9900/bin/tiflash/tiflash, CU 0xe1a131d, DIE 0xe3482f5>, curr_region_id=curr_region_id@entry=2464367, index=index@entry=9, term=term@entry=6, tmt=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/KVStore.cpp:497
#12 0x00000000072f0988 in DB::HandleAdminRaftCmd (server=0x7ffe939ea620, req_buff=..., resp_buff=..., header=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/ProxyFFI.cpp:85
...
(gdb) up
#4  DB::RWLock::getLock (this=0x7f832e69a080, type=type@entry=DB::RWLock::Read, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:170
170 /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp: No such file or directory.
# A readers list with 2 groups
(gdb) p it_group
$1 = {_M_node = 0x7f8352e749a0}
(gdb) p it_group._M_node
$2 = (std::__detail::_List_node_base *) 0x7f8352e749a0
(gdb) p it_group._M_node._M_next
$3 = (std::__detail::_List_node_base *) 0x7f832e69a0b8
(gdb) p it_group._M_node._M_next._M_next
$4 = (std::__detail::_List_node_base *) 0x7f7e5520fe20
(gdb) p it_group._M_node._M_next._M_next._M_next
$5 = (std::__detail::_List_node_base *) 0x7f8352e749a0
(gdb) p rdlock_owner
$6 = {_M_node = 0x7f7e5520fe20}
# Empty writers list
(gdb) p wrlock_owner
$7 = {_M_node = 0x7f832e69a0d0}
(gdb) p wrlock_owner._M_node
$8 = (std::__detail::_List_node_base *) 0x7f832e69a0d0
(gdb) p *wrlock_owner._M_node
$9 = {_M_next = 0x7f832e69a0d0, _M_prev = 0x7f832e69a0d0}

# checkout the thread of `shouldOptimizeTable`, it is waiting on a read lock on "drop_lock",
# waiting on the same reader group with the thread of `removeObsoleteDataInStorage` (it_group => 0x7f8352e749a0)
(gdb) bt
[Switching to thread (Thread 0x7f833d7f5700 (LWP 218899))]
...
#3  0x00000000079c2e33 in wait<DB::RWLock::getLock(DB::RWLock::Type, const String&, const milliseconds&)::<lambda()> > (__p=..., __lock=..., this=0x7f8352e749c0) at /usr/local/include/c++/7.4.0/condition_variable:99
#4  DB::RWLock::getLock (this=0x7f832e69a080, type=type@entry=DB::RWLock::Read, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:170
#5  0x00000000070a59f4 in DB::IStorage::tryLockTimed (this=this@entry=0x7f832e631010, rwlock=..., type=type@entry=DB::RWLock::Read, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:17
#6  0x00000000070a5fcb in DB::IStorage::lockStructureForShare (this=0x7f832e631010, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:58
#7  0x000000000732c0b7 in DB::shouldOptimizeTable (table_id=table_id@entry=137, tmt=..., log=0x7f83815734e0, threshold=threshold@entry=0.59999999999999998) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTableCheckOptimize.cpp:106
#8  0x000000000732d169 in DB::RegionTable::checkTableOptimize (this=this@entry=0x7f83816e8088, table_id=137, threshold=threshold@entry=0.59999999999999998) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTableCheckOptimize.cpp:173
#9  0x000000000732dd30 in DB::RegionTable::checkTableOptimize (this=0x7f83816e8088) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionTableCheckOptimize.cpp:158
#10 0x00000000072c987f in operator() (__closure=0x7f8352f40818) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/BackgroundService.cpp:21
#11 std::_Function_handler<bool(), DB::BackgroundService::BackgroundService(DB::TMTContext&)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/local/include/c++/7.4.0/bits/std_function.h:302
#12 0x0000000007200fa7 in operator() (this=<optimized out>) at /usr/local/include/c++/7.4.0/bits/std_function.h:706
#13 DB::BackgroundProcessingPool::threadFunction (this=0x7f8343c8fa50) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/MergeTree/BackgroundProcessingPool.cpp:187
...
(gdb) up
#4  DB::RWLock::getLock (this=0x7f832e69a080, type=type@entry=DB::RWLock::Read, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:170
170 /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp: No such file or directory.
(gdb) p it_group
$10 = {_M_node = 0x7f8352e749a0}

# checkout the thread of `SchemaSyncService::gc`
# it is waiting a write lock on the "alter_lock"
(gdb) bt
...
#5  DB::RWLock::getLock (this=0x7f832e64e000, type=type@entry=DB::RWLock::Write, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:176
#6  0x00000000070a59f4 in DB::IStorage::tryLockTimed (this=this@entry=0x7f832e631010, rwlock=..., type=type@entry=DB::RWLock::Write, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:17
#7  0x00000000070a6323 in DB::IStorage::lockExclusively (this=this@entry=0x7f832e631010, query_id=..., acquire_timeout=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/IStorage.cpp:66
#8  0x00000000075b4233 in DB::InterpreterDropQuery::execute (this=this@entry=0x7f83413d13a0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/InterpreterDropQuery.cpp:133
#9  0x00000000073423ec in DB::SchemaSyncService::gc (this=0x7f8354ef68f0, gc_safe_point=426317224242315264) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/SchemaSyncService.cpp:132
#10 0x000000000734465d in _ZZN2DB17SchemaSyncServiceC4ERNS_7ContextEENKUlvE_clEv () at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/SchemaSyncService.cpp:40
...
(gdb) up
#5  DB::RWLock::getLock (this=0x7f832e64e000, type=type@entry=DB::RWLock::Write, query_id=..., lock_timeout_ms=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp:176
176 /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Common/RWLock.cpp: No such file or directory.
(gdb) p it_group
$11 = {_M_node = 0x7f8294275260}
(gdb) p this
$12 = (DB::RWLock * const) 0x7f832e64e000
(gdb) p it_group
$13 = {_M_node = 0x7f8294275260}
(gdb) p rdlock_owner
$14 = {_M_node = 0x7f8296263740}

# A reader list with one reader
(gdb) p *rdlock_owner._M_node
$17 = {_M_next = 0x7f832e64e038, _M_prev = 0x7f832e64e038}
(gdb) p *rdlock_owner._M_node._M_next
$18 = {_M_next = 0x7f8296263740, _M_prev = 0x7f8296263740}
(gdb) p *rdlock_owner._M_node._M_next._M_next
$19 = {_M_next = 0x7f832e64e038, _M_prev = 0x7f832e64e038}

# Empty writers list
(gdb) p wrlock_owner
$20 = {_M_node = 0x7f832e64e050}
(gdb) p wrlock_owner._M_node
$21 = (std::__detail::_List_node_base *) 0x7f832e64e050
(gdb) p wrlock_owner._M_node._M_next
$22 = (std::__detail::_List_node_base *) 0x7f8294275260
(gdb) p wrlock_owner._M_node._M_next._M_next
$23 = (std::__detail::_List_node_base *) 0x7f832e64e050
(gdb) p wrlock_owner._M_node._M_next._M_next._M_next
JaySon-Huang commented 3 years ago

@jiangyuzhao please help me reproduce this situation with branch [1], it will print out the query ID if happen to a deadlock.

[1] https://github.com/JaySon-Huang/tics/commit/4b76143a0b2cb0f3c566bf7bae5fa47e106a631a#diff-63e5be0dc7f8488585986ecd3568b52335ff340499f21e998e193533ce2bb22eR23-R36

JaySon-Huang commented 3 years ago

Find another issue: https://github.com/pingcap/tics/issues/2529. Maybe the changes in the previous commit make it hard to reproduce this deadlock. Will try it with fewer changes.

JaySon-Huang commented 3 years ago

From the stack in previous comment, we can see that thread 10[1] is a DAGQueryBlock with children (maybe an MPPTask with "join"). Assume that it is a join query that table "t1" join table "t2", it will first do learner read and acquire a read lock on "drop" for "t1", save the input streams from "t1" into input_streams_vec. Then it will do learner read and acquire a read lock on "drop" for "t2".

windtalker says that one DAGQueryBlock only read one table. "t1" join "t2" will be split into different DAGRequest

[1]

# Thread 10 is stuck by wait index. The `DAGQueryBlock` being executing has children (there are a frame with InterpreterDAG.cpp:48).
# No sure whether it holds table lock or not.
Thread 10 (Thread 0x7f81cbdff700 (LWP 294957)):
...
#3  0x000000000731abf2 in wait<DB::RegionMeta::waitIndex(DB::UInt64, const atomic_bool&) const::<lambda()> > (__p=..., __lock=..., this=0x7f82a74a94c8) at /usr/local/include/c++/7.4.0/condition_variable:99
#4  DB::RegionMeta::waitIndex (this=this@entry=0x7f82a74a93f8, index=index@entry=13, terminated=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/RegionMeta.cpp:151
#5  0x00000000072f21fc in DB::Region::waitIndex (this=this@entry=0x7f82a74a9310, index=13, terminated=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/Region.cpp:493
#6  0x00000000076e76f3 in DB::<lambda(size_t)>::operator()(size_t) const (__closure=__closure@entry=0x7f81cbdcf8f0, region_begin_idx=region_begin_idx@entry=0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/LearnerRead.cpp:199
#7  0x00000000076e9585 in DB::doLearnerRead (table_id=<optimized out>, mvcc_query_info=..., num_streams=<optimized out>, tmt=..., log=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Storages/Transaction/LearnerRead.cpp:243
#8  0x00000000078cac50 in DB::DAGQueryBlockInterpreter::executeTS (this=this@entry=0x7f81cbdd22b0, ts=..., pipeline=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:221
#9  0x00000000078cdbbd in DB::DAGQueryBlockInterpreter::executeImpl (this=this@entry=0x7f81cbdd22b0, pipeline=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:1348
#10 0x00000000078cf07d in DB::DAGQueryBlockInterpreter::execute (this=this@entry=0x7f81cbdd22b0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp:1436
#11 0x000000000788811d in DB::InterpreterDAG::executeQueryBlock (this=this@entry=0x7f8209a24380, query_block=..., subqueriesForSets=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp:53
#12 0x0000000007887f36 in DB::InterpreterDAG::executeQueryBlock (this=this@entry=0x7f8209a24380, query_block=..., subqueriesForSets=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp:48
#13 0x0000000007888ba5 in DB::InterpreterDAG::execute (this=0x7f8209a24380) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp:80
#14 0x0000000007017451 in DB::executeQueryImpl (query_src=..., context=..., internal=internal@entry=false, stage=stage@entry=DB::QueryProcessingStage::Complete) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/executeQuery.cpp:205
#15 0x00000000070186e4 in DB::executeQuery (dag=..., context=..., internal=internal@entry=false, stage=stage@entry=DB::QueryProcessingStage::Complete) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/executeQuery.cpp:398
#16 0x00000000078a0ffe in DB::MPPTask::prepare (this=0x7f822bd51c10, task_request=...) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Mpp/MPPHandler.cpp:184
#17 0x00000000078a4241 in DB::MPPHandler::execute (this=this@entry=0x7f81cbdd4270, context=..., response=0x7f81cbdd5450) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/Mpp/MPPHandler.cpp:368
#18 0x00000000078570db in DB::FlashService::DispatchMPPTask (this=0x7f8381e207a0, grpc_context=0x7f8209a88020, request=0x7f8209a0d3b0, response=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Flash/FlashService.cpp:140
...

[2] https://github.com/pingcap/tics/blob/5572103586752feb543063a8a408e973fd8c1d4d/dbms/src/Flash/Coprocessor/InterpreterDAG.cpp#L43-L54

JaySon-Huang commented 2 years ago

lower the severity to moderate cause it is hard to reproduce