vesoft-inc / nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability
https://nebula-graph.io
Apache License 2.0
10.65k stars 1.19k forks source link

we should take care of the resource release sequence when the processes exit #716

Closed wadeliuyi closed 5 years ago

wadeliuyi commented 5 years ago

fix crash problem when stop process that meta client and raft part depend io thread pool, but the io thread pool will stopped first by gServer: 1, raft bt. (gdb) bt

0 0x000000000208f587 in folly::IOThreadPoolExecutor::getEventBase (this=) at /usr/include/c++/8/bits/shared_ptr_base.h:1018

1 0x0000000001b48970 in nebula::raftex::RaftPart::appendLogsInternal (this=0x7fbfdc168c10, iter=..., termId=8) at /home/wade.liu/rd/nebula/src/kvstore/raftex/RaftPart.cpp:512

2 0x0000000001b47dd0 in nebula::raftex::RaftPart::appendLogAsync (this=0x7fbfdc168c10, source=0 '\000', logType=nebula::raftex::LogType::NORMAL, log="")

at /home/wade.liu/rd/nebula/src/kvstore/raftex/RaftPart.cpp:452

3 0x0000000001b501ff in nebula::raftex::RaftPart::sendHeartbeat (this=0x7fbfdc168c10) at /home/wade.liu/rd/nebula/src/kvstore/raftex/RaftPart.cpp:1270

4 0x0000000001b4ce4f in nebula::raftex::RaftPart::statusPolling (this=0x7fbfdc168c10) at /home/wade.liu/rd/nebula/src/kvstore/raftex/RaftPart.cpp:940

5 0x0000000001b4c9d8 in nebula::raftex::RaftPart::<lambda()>::operator()(void) const (__closure=0x7fbfd7ce7400) at /home/wade.liu/rd/nebula/src/kvstore/raftex/RaftPart.cpp:949

6 0x0000000001b5d36c in std::invoke_impl<void, nebula::raftex::RaftPart::statusPolling()::<lambda()>&>(std::__invoke_other, nebula::raftex::RaftPart::<lambda()> &) (f=...)

at /usr/include/c++/8/bits/invoke.h:60

7 0x0000000001b5d2a1 in std::invoke<nebula::raftex::RaftPart::statusPolling()::<lambda()>&>(nebula::raftex::RaftPart::<lambda()> &) (fn=...) at /usr/include/c++/8/bits/invoke.h:95

8 0x0000000001b5d19a in std::_Bind<nebula::raftex::RaftPart::statusPolling()::<lambda()>()>::call(std::tuple<> &&, std::_Index_tuple<>) (this=0x7fbfd7ce7400, args=...)

at /usr/include/c++/8/functional:400

9 0x0000000001b5ccaa in std::_Bind<nebula::raftex::RaftPart::statusPolling()::<lambda()>()>::operator()<>(void) (this=0x7fbfd7ce7400) at /usr/include/c++/8/functional:484

10 0x0000000001b5c647 in std::_Function_handler<void(), std::_Bind<nebula::raftex::RaftPart::statusPolling()::<lambda()>()> >::_M_invoke(const std::_Any_data &) (__functor=...)

at /usr/include/c++/8/bits/std_function.h:297

2, meta bt.

0 0x000000000208dc37 in folly::IOThreadPoolExecutor::getEventBase (this=) at /usr/include/c++/8/bits/shared_ptr_base.h:1018

1 0x00000000018169eb in nebula::meta::MetaClient::getResponse<nebula::meta::cpp2::HBReq, nebula::meta::MetaClient::heartbeat()::<lambda(auto:110, auto:111)>, nebula::meta::MetaClient::heartbeat()::<lambda(nebula::meta::cpp2::HBResp&&)> >(nebula::meta::cpp2::HBReq, nebula::meta::MetaClient::<lambda(auto:110, auto:111)>, nebula::meta::MetaClient::<lambda(nebula::meta::cpp2::HBResp&&)>, bool) (

this=0x7f7642d60600, req=..., remoteFunc=..., respGen=..., toLeader=true) at /home/wade.liu/rd/nebula/src/meta/client/MetaClient.cpp:254

2 0x000000000180d6f6 in nebula::meta::MetaClient::heartbeat (this=0x7f7642d60600) at /home/wade.liu/rd/nebula/src/meta/client/MetaClient.cpp:987

3 0x0000000001806204 in nebula::meta::MetaClient::heartBeatThreadFunc (this=0x7f7642d60600) at /home/wade.liu/rd/nebula/src/meta/client/MetaClient.cpp:85

4 0x00000000018876da in std::__invoke_impl<void, void (nebula::meta::MetaClient::&)(), nebula::meta::MetaClient&> (

f=@0x7f7642dc1ea0: (void (nebula::meta::MetaClient::)(nebula::meta::MetaClient const)) 0x18061e0 nebula::meta::MetaClient::heartBeatThreadFunc(), t=@0x7f7642dc1eb0: 0x7f7642d60600) at /usr/include/c++/8/bits/invoke.h:73

wadeliuyi commented 5 years ago

I think three ways to solve it, maybe not the best. first is one process just one rpc server, maybe we can combine the raft service and main service together, but we also take care about the resource release sequence when the process exit. second is raft service and main service not share the io thread pool, raft use it self, but raft service have a high flow of io, so we need open many thread for raft service, this result the whole process has too much thread. the other one is that define a stop interface for every module, and stop them by a right sequence when process receive a stop signal.

dangleptr commented 5 years ago

732

dangleptr commented 5 years ago

Close it now