Closed Lily2025 closed 4 weeks ago
/assign CalvinNeo
/severity critical
The reason is that too many threads were created in StorageDisaggregated, resulting in thread creation failure.
std::__1::system_error, e.what() = thread constructor failed: Resource temporarily unavailable。
Change it to an enhancement because it is caused by a large amount of requests making too many threads. We will try to reduce the number of threads created for handling disaggregated requests.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1、run ch 2、inject one of cn network partition
2. What did you expect to see? (Required)
no crash
3. What did you see instead (Required)
tiflash cn crash occurs after the network isolation recovery
{"stream":"stdout","container":"errorlog","pod":"secondary-tc-tiflash-0","namespace":"ha-test-disagg-tiflash-tps-7552417-1-58","time":"2024-08-19T17:34:59.20448412Z","log":"[2024/08/20 01:34:58.361 +08:00] [ERROR] [BaseDaemon.cpp:560] [\"\n 0x55a4c9778b9e\tfaultSignalHandler(int, siginfo_t, void) [tiflash+124169118]\n \tlibs/libdaemon/src/BaseDaemon.cpp:211\n 0x7fb214a5e6f0\t [libc.so.6+255728]\n 0x55a4c95a7d9a\tDB::DM::SegmentReadTask::SegmentReadTask(std::1::shared_ptr const&, DB::Context const&, std::__1::shared_ptr const&, DB::DM::RemotePb::RemoteSegment const&, DB::DM::DisaggTaskId const&, unsigned long, std:: 1::basic_string<char, std::1::char_traits, std::1::allocator> const&, unsigned int, long) [tiflash+122264986]\n \t/usr/local/bin/../include/c++/v1/ memory/shared_ptr.h:884\n 0x55a4cad9eb63\tstd:: 1::function::func<DB::StorageDisaggregated::buildReadTaskForWriteNodeTable(DB::Context const&, std::1::shared_ptr const&, DB::DM::DisaggTaskId const&, unsigned long, std::__1::basic_string<char, std:: 1::char_traits, std::1::allocator> const&, std:: 1::basic_string<char, std::1::char_traits, std::1::allocator> const&, std:: 1::mutex&, std::1::list<std::1::shared_ptr, std:: 1::allocator<std::1::shared_ptr>>&)::$_0, std:: 1::allocator<DB::StorageDisaggregated::buildReadTaskForWriteNodeTable(DB::Context const&, std::1::shared_ptr const&, DB::DM::DisaggTaskId const&, unsigned long, std::__1::basic_string<char, std:: 1::char_traits, std::1::allocator> const&, std:: 1::basic_string<char, std::1::char_traits, std::1::allocator> const&, std:: 1::mutex&, std::1::list<std::1::shared_ptr, std:: 1::allocator<std::1::shared_ptr>>&)::$_0>, void ()>::operator()() (.139ff689715caee4ff84ce0b2eee41ae) [tiflash+147393379]\n \t/usr/local/bin/../include/c++/v1/ memory/construct_at.h:41\n 0x55a4c9a903b5\tauto DB::wrapInvocable<std::1::function<void ()>>(bool, std::1::function<void ()>&&)::'lambda'()::operator()() [tiflash+127411125]\n \t/usr/local/bin/../include/c++/v1/functional/function.h:517\n 0x55a4c41e60c5\tstd::1::packaged_task<void ()>::operator()() [tiflash+34439365]\n \t/usr/local/bin/../include/c++/v1/future:1891\n 0x55a4c419e4d6\tDB::DynamicThreadPool::executeTask(std::1::unique_ptr<DB::IExecutableTask, std::__1::default_delete>&) [tiflash+34145494]\n \tdbms/src/Common/DynamicThreadPool.cpp:124\n 0x55a4c419e973\tDB::DynamicThreadPool::dynamicWork(std:: 1::unique_ptr<DB::IExecutableTask, std::1::default_delete>) [tiflash+34146675]\n \tdbms/src/Common/DynamicThreadPool.cpp:148\n 0x55a4c419f3df\tvoid* std::1::thread_proxy[abi:ue170006]<std:: 1::tuple<std::1::unique_ptr<std::1::thread_struct, std::1::default_delete>, std::1::thread DB::ThreadFactory::newThread<void (DB::DynamicThreadPool::*)(std::1::unique_ptr<DB::IExecutableTask, std::1::default_delete>), DB::DynamicThreadPool*, std::__1::unique_ptr<DB::IExecutableTask, std:: 1::default_delete>>(bool, std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator>, void (DB::DynamicThreadPool::&&)(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete>), DB::DynamicThreadPool &&, std::1::unique_ptr<DB::IExecutableTask, std::__1::default_delete>&&)::'lambda'(auto&&...), DB::DynamicThreadPool*, std:: 1::unique_ptr<DB::IExecutableTask, std::__1::default_delete>>>(void*) [tiflash+34149343]\n \t/usr/local/bin/../include/c++/v1/__type_traits/invoke.h:308\n 0x7fb214aa9c02\tstart_thread [libc.so.6+564226]\"] [source=BaseDaemon] [thread_id=30184]\n"}
4. What is your TiFlash version? (Required)
/tiflash/tiflash version TiFlash Release Version: v8.3.0-alpha Edition: Community Git Commit Hash: 14ed7c021b23fffda165ad6a23ea4358001bd54e Git Branch: heads/refs/tags/v8.3.0-alpha UTC Build Time: 2024-08-15 11:39:16 Enable Features: jemalloc sm4(GmSSL) mem-profiling avx2 avx512 unwind thinlto Profile: RELWITHDEBINFO Compiler: clang++ 17.0.6
Raft Proxy Git Commit Hash: 4ebe44d321d4c738d89bc145d418b1d6f3464862 Git Commit Branch: HEAD UTC Build Time: ""
Rust Version: rustc 1.77.0-nightly (89e2160c4 2023-12-27) Storage Engine: tiflash Prometheus Prefix: tiflashproxy Profile: release Enable Features: external-je