Open Lily2025 opened 5 months ago
Please answer these questions before submitting your issue. Thanks!
1、run ch 2、all wn crash due to minio full
all cn are normal
one of cn was abnormal when all wn crash
{"container":"data0","namespace":"ha-test-serverless-htap-tps-7567184-1-220","log":"[BaseDaemon.cpp:563] [\"\\n 0x77783e1\\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+125273057]\\n \\tlibs/libdaemon/src/BaseDaemon.cpp:214\\n 0x7fa29e265630\\t<unknown symbol> [libpthread.so.0+63024]\\n 0x7fa29daa4387\\tgsignal [libc.so.6+222087]\\n 0x7fa29daa5a78\\t__GI_abort [libc.so.6+227960]\\n 0x993c731\\tabsl::lts_20211102::raw_logging_internal::RawLog(absl::lts_20211102::LogSeverity, char const*, int, char const*, ...) [tiflash+160679729]\\n \\tcontrib/abseil-cpp/absl/base/internal/raw_logging.cc:216\\n 0x9a9ed87\\tabsl::lts_20211102::base_internal::LowLevelAlloc::Alloc(unsigned long) [tiflash+162131335]\\n \\tcontrib/abseil-cpp/absl/base/internal/low_level_alloc.cc:606\\n 0x9933bef\\tabsl::lts_20211102::synchronization_internal::CreateThreadIdentity() [tiflash+160644079]\\n \\tcontrib/abseil-cpp/absl/synchronization/internal/create_thread_identity.cc:129\\n 0x9930eea\\tabsl::lts_20211102::Mutex::LockSlow(absl::lts_20211102::MuHowS const*, absl::lts_20211102::Condition const*, int) [tiflash+160632554]\\n \\tcontrib/abseil-cpp/absl/synchronization/mutex.cc:1768\\n 0x9341027\\tpollset_work(grpc_pollset*, grpc_pollset_worker**, long) [tiflash+154406951]\\n \\tcontrib/grpc/src/core/lib/iomgr/ev_epollex_linux.cc:1127\\n 0x93c033b\\tcq_pluck(grpc_completion_queue*, void*, gpr_timespec, void*) [tiflash+154927931]\\n \\tcontrib/grpc/src/core/lib/surface/completion_queue.cc:1294\\n 0x8ea19da\\tgrpc::internal::BlockingUnaryCallImpl<google::protobuf::MessageLite, google::protobuf::MessageLite>::BlockingUnaryCallImpl(grpc::ChannelInterface*, grpc::internal::RpcMethod const&, grpc::ClientContext*, google::protobuf::MessageLite const&, google::protobuf::MessageLite*) [tiflash+149559770]\\n \\tcontrib/grpc/include/grpcpp/impl/codegen/client_unary_call.h:83\\n 0x97d5c19\\tgrpc::Status grpc::internal::BlockingUnaryCall<pdpb::GetRegionRequest, pdpb::GetRegionResponse, google::protobuf::MessageLite, google::protobuf::MessageLite>(grpc::ChannelInterface*, grpc::internal::RpcMethod const&, grpc::ClientContext*, pdpb::GetRegionRequest const&, pdpb::GetRegionResponse*) [tiflash+159210521]\\n \\tcontrib/grpc/include/grpcpp/impl/codegen/client_unary_call.h:52\\n 0x91db48c\\tpingcap::pd::Client::getRegionByKey(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+152941708]\\n \\tcontrib/client-c/src/pd/Client.cc:364\\n 0x8a701b0\\tpingcap::pd::CodecClient::getRegionByKey(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+145162672]\\n \\tcontrib/client-c/include/pingcap/pd/CodecClient.h:22\\n 0x91c6452\\tpingcap::kv::RegionCache::locateKey(pingcap::kv::Backoffer&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+152855634]\\n \\tcontrib/client-c/src/kv/RegionCache.cc:103\\n 0x91ec601\\tpingcap::coprocessor::buildBatchCopTasks(pingcap::kv::Backoffer&, pingcap::kv::Cluster*, bool, bool, std::__1::vector<long, std::__1::allocator<long> > const&, std::__1::vector<std::__1::vector<pingcap::coprocessor::KeyRange, std::__1::allocator<pingcap::coprocessor::KeyRange> >, std::__1::allocator<std::__1::vector<pingcap::coprocessor::KeyRange, std::__1::allocator<pingcap::coprocessor::KeyRange> > > > const&, pingcap::kv::StoreType, bool (* const&)(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&), Poco::Logger*) [tiflash+153011713]\\n \\tcontrib/client-c/src/coprocessor/Client.cc:367\\n 0x882708d\\tDB::StorageDisaggregated::buildBatchCopTasks(std::__1::vector<std::__1::pair<long, std::__1::vector<pingcap::coprocessor::KeyRange, std::__1::allocator<pingcap::coprocessor::KeyRange> > >, std::__1::allocator<std::__1::pair<long, std::__1::vector<pingcap::coprocessor::KeyRange, std::__1::allocator<pingcap::coprocessor::KeyRange> > > > > const&, bool (* const&)(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&)) [tiflash+142766221]\\n \\tdbms/src/Storages/StorageDisaggregated.cpp:172\\n 0x882d144\\tDB::StorageDisaggregated::buildReadTaskWithBackoff(DB::Context const&) [tiflash+142790980]\\n \\tdbms/src/Storages/StorageDisaggregatedRemote.cpp:147\\n 0x882df76\\tDB::StorageDisaggregated::readThroughS3(DB::PipelineExecutorContext&, DB::PipelineExecGroupBuilder&, DB::Context const&, unsigned int) [tiflash+142794614]\\n \\tdbms/src/Storages/StorageDisaggregatedRemote.cpp:105\\n 0x89f2a30\\tDB::PhysicalTableScan::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+144648752]\\n \\tdbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp:132\\n 0x896fd62\\tDB::PhysicalPlanNode::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+144112994]\\n \\tdbms/src/Flash/Planner/PhysicalPlanNode.cpp:116\\n 0x896fd62\\tDB::PhysicalPlanNode::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+144112994]\\n \\tdbms/src/Flash/Planner/PhysicalPlanNode.cpp:116\\n 0x896b349\\tDB::PhysicalPlan::toPipeline(DB::PipelineExecutorContext&, DB::Context&) [tiflash+144094025]\\n \\tdbms/src/Flash/Planner/PhysicalPlan.cpp:325\\n 0x8924e97\\tDB::PipelineExecutor::PipelineExecutor(std::__1::shared_ptr<MemoryTracker> const&, DB::AutoSpillTrigger*, std::__1::function<void (std::__1::shared_ptr<DB::OperatorSpillContext> const&)> const&, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+143806103]\\n \\tdbms/src/Flash/Executor/PipelineExecutor.cpp:45\\n 0x8782f38\\tDB::(anonymous namespace)::executeAsPipeline(DB::Context&, bool) [tiflash+142094136]\\n \\tdbms/src/Flash/executeQuery.cpp:199\\n 0x8782614\\tDB::queryExecute(DB::Context&, bool) [tiflash+142091796]\\n \\tdbms/src/Flash/executeQuery.cpp:239\\n 0x88cabcb\\tDB::MPPTask::runImpl() [tiflash+143436747]\\n \\tdbms/src/Flash/Mpp/MPPTask.cpp:527\\n 0x1fef248\\tauto DB::wrapInvocable<std::__1::function<void ()> >(bool, std::__1::function<void ()>&&)::'lambda'()::operator()() [tiflash+33485384]\\n \\tdbms/src/Common/wrapInvocable.h:36\\n 0x1eda2b3\\tDB::DynamicThreadPool::executeTask(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&) [tiflash+32350899]\\n \\tdbms/src/Common/DynamicThreadPool.cpp:124\\n 0x1eda6f6\\tDB::DynamicThreadPool::dynamicWork(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >) [tiflash+32351990]\\n \\tdbms/src/Common/DynamicThreadPool.cpp:148\\n 0x1edad12\\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::thread DB::ThreadFactory::newThread<void (DB::DynamicThreadPool::*)(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >), DB::DynamicThreadPool*, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> > >(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, void (DB::DynamicThreadPool::*&&)(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >), DB::DynamicThreadPool*&&, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&&)::'lambda'(auto&&...), DB::DynamicThreadPool*, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> > > >(void*) [tiflash+32353554]\\n \\t/usr/local/bin/../include/c++/v1/thread:291\\n 0x7fa29e25dea5\\tstart_thread [libpthread.so.0+32421]\"] [source=BaseDaemon] [thread_id=32480]","pod":"secondary-tc-tiflash-0","level":"ERROR"}
/tiflash/tiflash version TiFlash Release Version: v7.1.0-alpha-553-ge86ad8e Edition: Community Git Commit Hash: e86ad8e000690337b205ed8aa1e2afeca38c4f55 Git Branch: cloud-engine-on-release-7.5 UTC Build Time: 2024-04-12 03:00:05 Enable Features: jemalloc sm4(GmSSL) avx2 avx512 unwind thinlto Profile: RELWITHDEBINFO Raft Proxy Git Commit Hash: 200fa6be0189a635a56470c0cd2f5dd700e2228f Git Commit Branch: HEAD UTC Build Time: 2024-04-12 03:03:50 Rust Version: rustc 1.67.0-nightly (96ddd32c4 2022-11-14) Storage Engine: tiflash Prometheus Prefix: tiflash_proxy_ Profile: release Enable Features: "raftstore-proxy/external-jemalloc" portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure 2024-04-15T05:57:51.515+0800
/assign JaySon-Huang
Seems after all wn crashes, the running "mpp_tasks" on cn does not get canceled correctly
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1、run ch 2、all wn crash due to minio full
2. What did you expect to see? (Required)
all cn are normal
3. What did you see instead (Required)
one of cn was abnormal when all wn crash
4. What is your TiFlash version? (Required)