skalenetwork / skaled

Running more than 20 production blockchains, SKALED is Ethereum-compatible, high performance C++ Proof-of-Stake client, tools and libraries. Uses SKALE consensus as a blockchain consensus core. Includes dynamic Oracle. Implements file storage and retrieval as an EVM extension.
https://skale.network
GNU General Public License v3.0
84 stars 40 forks source link

Skaled does not exit gracefully when it unable to connect to 2/3 nodes. #1679

Open badrogger opened 1 year ago

badrogger commented 1 year ago

Describe the bug Skaled exits gracelessly when it cannot connect 2/3 peers.

To Reproduce Steps to reproduce the behavior:

  1. Run skale chain (3.17.0-develop.62)
  2. Turn off skale admin container.
  3. Disconnect the node from the network.
  4. Observe how skaled exits.

Expected behavior Skaled should exit gracefully.

Actual behavior Skaled crashes during exit procedure with the following error (more logs attached):

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::log::v2s_mt_posix::system_error> >'
pure virtual method called
terminate called recursively

Caught (first) signal. Signal SIGABRT(6). This is the abort signal. Typically, a process will initiate this kill signal on itself.

more-skaled-exiting.log

dimalit commented 1 year ago
Dispatch: All threads stopped
Dispatch: All dispatch queues removed

  /home/dimalit/skaled/build/skaled/skaled : dev::ExitHandler::exitHandler(int, dev::ExitHandler::exit_code_t)+0x44d [0x5647749d187d]
  /home/dimalit/skaled/build/skaled/skaled : dev::ExitHandler::exitHandler(int)+0x1e [0x5647749d07ae]
  /lib/x86_64-linux-gnu/libc.so.6 : ()+0x42520 [0x7f38864bd520]
  /lib/x86_64-linux-gnu/libc.so.6 : pthread_kill()+0x12c [0x7f3886511a7c]
  /lib/x86_64-linux-gnu/libc.so.6 : raise()+0x16 [0x7f38864bd476]
  /lib/x86_64-linux-gnu/libc.so.6 : abort()+0xd3 [0x7f38864a37f3]
  /lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0xb042a [0x7f388685c42a]
  /lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0xae20c [0x7f388685a20c]
  /lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0xae277 [0x7f388685a277]
  /lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0xaefa5 [0x7f388685afa5]
  /home/dimalit/skaled/build/skaled/skaled : boost::system::error_code::message[abi:cxx11]() const+0x5b [0x564774575863]
  /home/dimalit/skaled/build/skaled/skaled : boost::system::system_error::what() const+0x9f [0x564774575e9d]
  /lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0xa2b5d [0x7f388684eb5d]
  /lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0xae20c [0x7f388685a20c]
  /lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0xad1e9 [0x7f38868591e9]
  /lib/x86_64-linux-gnu/libstdc++.so.6 : __gxx_personality_v0()+0x99 [0x7f3886859959]
  /lib/x86_64-linux-gnu/libgcc_s.so.1 : ()+0x16884 [0x7f38866b9884]
  /lib/x86_64-linux-gnu/libgcc_s.so.1 : _Unwind_Resume()+0x12d [0x7f38866ba2dd]
  /home/dimalit/skaled/build/skaled/skaled : boost::log::v2s_mt_posix::sources::basic_composite_logger<char, boost::log::v2s_mt_posix::sources::severity_channel_logger_mt<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::log::v2s_mt_posix::sources::multi_thread_model<boost::log::v2s_mt_posix::aux::light_rw_mutex>, boost::log::v2s_mt_posix::sources::features<boost::log::v2s_mt_posix::sources::severity<int>, boost::log::v2s_mt_posix::sources::channel<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::open_record()+0xbf [0x5647745d1bbf]
  /home/dimalit/skaled/build/skaled/skaled : SkaleHost::stopWorking()+0x8c [0x564774825fc0]
  /home/dimalit/skaled/build/skaled/skaled : dev::eth::Client::stopWorking()+0x71 [0x56477477596b]
  /home/dimalit/skaled/build/skaled/skaled : dev::eth::Client::~Client()+0x47 [0x56477477565f]
  /home/dimalit/skaled/build/skaled/skaled : dev::eth::EthashClient::~EthashClient()+0x8f [0x5647748f249f]
  /home/dimalit/skaled/build/skaled/skaled : dev::eth::EthashClient::~EthashClient()+0x1c [0x5647748f24ce]
  /home/dimalit/skaled/build/skaled/skaled : std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const+0x2c [0x5647745e5920]
  /home/dimalit/skaled/build/skaled/skaled : std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::~unique_ptr()+0x56 [0x5647745d4fba]
  /lib/x86_64-linux-gnu/libc.so.6 : ()+0x45495 [0x7f38864c0495]
  /lib/x86_64-linux-gnu/libc.so.6 : on_exit()+0 [0x7f38864c0610]
  /home/dimalit/skaled/build/skaled/skaled : Schain::healthCheck()+0x363 [0x564774cd1d55]
  /home/dimalit/skaled/build/skaled/skaled : Node::startClients()+0x3a [0x564774bead0a]
  /home/dimalit/skaled/build/skaled/skaled : ConsensusEngine::startAll()+0x802 [0x564774ba4738]
  /home/dimalit/skaled/build/skaled/skaled : ()+0x1f24551 [0x564774825551]
  /home/dimalit/skaled/build/skaled/skaled : ()+0x1f28890 [0x564774829890]
dimalit commented 1 year ago

As a solution, I'd propose not to call exit() directly from consensus, but use ConsesusExtFace::terminateApplication() instead

kladkogex commented 7 months ago

Moving to 2.5 since we do not have time for this in 2.4