Open ScottSallinen opened 4 years ago
The stack trace takes place while the node is shutting down. We can add a timeout on the thread join, but this seems like a fairly minimal issue. The deadlock is not occurring while the node is operational. If you don't believe the node should have been shutting down, there may be a different issue, but this is not enough information to diagnose the problem.
Hm, yes, unfortunately I tried to shut it down before taking the stack trace, I should have done it the other way around.
If it occurs again I will take the trace while it's halted.
Attached is another log from another API node (different one) taken while stuck. gdb.txt
I am not seeing any deadlock here...
The write thread:
Thread 524 (Thread 0x7f9963dff700 (LWP 19986)):
#0 0x000055bed7c17c60 in boost::multi_index::detail::ordered_index_node_impl<boost::multi_index::detail::null_augment_policy, boost::interprocess::allocator<char, boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index> > >::rebalance_for_erase(boost::interprocess::offset_ptr<boost::multi_index::detail::ordered_index_node_impl<boost::multi_index::detail::null_augment_policy, boost::interprocess::allocator<char, boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index> > >, long, unsigned long, 0ul>, boost::interprocess::offset_ptr<boost::multi_index::detail::ordered_index_node_impl<boost::multi_index::detail::null_augment_policy, boost::interprocess::allocator<char, boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index> > >, long, unsigned long, 0ul>&, boost::interprocess::offset_ptr<boost::multi_index::detail::ordered_index_node_impl<boost::multi_index::detail::null_augment_policy, boost::interprocess::allocator<char, boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index> > >, long, unsigned long, 0ul>&, boost::interprocess::offset_ptr<boost::multi_index::detail::ordered_index_node_impl<boost::multi_index::detail::null_augment_policy, boost::interprocess::allocator<char, boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index> > >, long, unsigned long, 0ul>&) ()
#1 0x000055bed8191080 in _ZN5boost11multi_index6detail18ordered_index_implINS0_13composite_keyIN5steem5chain14comment_objectENS0_6memberIS6_N2fc14time_point_secEXadL_ZNS6_12cashout_timeEEEEENS7_IS6_N9chainbase3oidIS6_EEXadL_ZNS6_2idEEEEENS_6tuples9null_typeESG_SG_SG_SG_SG_SG_SG_EESt4lessINS0_20composite_key_resultISH_EEENS1_9nth_layerILi2ES6_NS0_10indexed_byINS0_14ordered_uniqueINS0_3tagINS5_5by_idEN4mpl_2naESS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_EESE_SS_EENSO_INSP_INS5_15by_cashout_timeESS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_EESH_SS_EENSO_INSP_INS5_11by_permlinkESS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_EENS3_IS6_NS7_IS6_NS4_8protocol17fixed_string_implINS8_7uint128EEEXadL_ZNS6_6authorEEEEENS7_IS6_NS_9container12basic_stringIcSt11char_traitsIcENS_12interprocess9allocatorIcNS19_15segment_managerIcNS19_15rbtree_best_fitINS19_12mutex_familyENS19_10offset_ptrIvlmLm0EEELm0EEENS19_10iset_indexEEEEEEEXadL_ZNS6_8permlinkEEEEESG_SG_SG_SG_SG_SG_SG_SG_EENS0_21composite_key_compareISI_IS13_ENS5_11strcmp_lessESG_SG_SG_SG_SG_SG_SG_SG_EEEENSO_INSP_INS5_7by_rootESS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_EENS3_IS6_NS7_IS6_SD_XadL_ZNS6_12root_commentEEEEESE_SG_SG_SG_SG_SG_SG_SG_SG_EESS_EENSO_INSP_INS5_9by_parentESS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_EENS3_IS6_NS7_IS6_S13_XadL_ZNS6_13parent_authorEEEEENS7_IS6_S1K_XadL_ZNS6_15parent_permlinkEEEEESE_SG_SG_SG_SG_SG_SG_SG_EENS1N_IS1O_S1P_SI_ISD_ESG_SG_SG_SG_SG_SG_SG_EEEESS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_SS_EENS1A_IS6_S1I_EEEENS_3mpl6v_itemISV_NS28_7vector0ISS_EELi0EEENS1_18ordered_unique_tagENS1_19null_augment_policyEE6erase_EPNS1_18ordered_index_nodeIS2E_NS2G_IS2E_NS2G_IS2E_NS2G_IS2E_NS1_15index_node_baseIS6_S26_EEEEEEEEEE ()
#2 0x000055bed83f68a7 in _ZN9chainbase13generic_indexIN5boost11multi_index21multi_index_containerIN5steem5chain14comment_objectENS2_10indexed_byINS2_14ordered_uniqueINS2_3tagINS5_5by_idEN4mpl_2naESC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_EENS2_6memberIS6_NS_3oidIS6_EEXadL_ZNS6_2idEEEEESC_EENS8_INS9_INS5_15by_cashout_timeESC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_EENS2_13composite_keyIS6_NSE_IS6_N2fc14time_point_secEXadL_ZNS6_12cashout_timeEEEEESH_NS1_6tuples9null_typeESQ_SQ_SQ_SQ_SQ_SQ_SQ_EESC_EENS8_INS9_INS5_11by_permlinkESC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_EENSL_IS6_NSE_IS6_NS4_8protocol17fixed_string_implINSM_7uint128EEEXadL_ZNS6_6authorEEEEENSE_IS6_NS1_9container12basic_stringIcSt11char_traitsIcENS1_12interprocess9allocatorIcNS14_15segment_managerIcNS14_15rbtree_best_fitINS14_12mutex_familyENS14_10offset_ptrIvlmLm0EEELm0EEENS14_10iset_indexEEEEEEEXadL_ZNS6_8permlinkEEEEESQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_EENS2_21composite_key_compareISt4lessISY_ENS5_11strcmp_lessESQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_EEEENS8_INS9_INS5_7by_rootESC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_EENSL_IS6_NSE_IS6_SG_XadL_ZNS6_12root_commentEEEEESH_SQ_SQ_SQ_SQ_SQ_SQ_SQ_SQ_EESC_EENS8_INS9_INS5_9by_parentESC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_EENSL_IS6_NSE_IS6_SY_XadL_ZNS6_13parent_authorEEEEENSE_IS6_S1F_XadL_ZNS6_15parent_permlinkEEEEESH_SQ_SQ_SQ_SQ_SQ_SQ_SQ_EENS1I_IS1K_S1L_S1J_ISG_ESQ_SQ_SQ_SQ_SQ_SQ_SQ_EEEESC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_EENS15_IS6_S1D_EEEEE4undoEv ()
#3 0x000055bed83f6bf9 in _ZN9chainbase12session_implINS_13generic_indexIN5boost11multi_index21multi_index_containerIN5steem5chain14comment_objectENS3_10indexed_byINS3_14ordered_uniqueINS3_3tagINS6_5by_idEN4mpl_2naESD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_EENS3_6memberIS7_NS_3oidIS7_EEXadL_ZNS7_2idEEEEESD_EENS9_INSA_INS6_15by_cashout_timeESD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_EENS3_13composite_keyIS7_NSF_IS7_N2fc14time_point_secEXadL_ZNS7_12cashout_timeEEEEESI_NS2_6tuples9null_typeESR_SR_SR_SR_SR_SR_SR_EESD_EENS9_INSA_INS6_11by_permlinkESD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_EENSM_IS7_NSF_IS7_NS5_8protocol17fixed_string_implINSN_7uint128EEEXadL_ZNS7_6authorEEEEENSF_IS7_NS2_9container12basic_stringIcSt11char_traitsIcENS2_12interprocess9allocatorIcNS15_15segment_managerIcNS15_15rbtree_best_fitINS15_12mutex_familyENS15_10offset_ptrIvlmLm0EEELm0EEENS15_10iset_indexEEEEEEEXadL_ZNS7_8permlinkEEEEESR_SR_SR_SR_SR_SR_SR_SR_EENS3_21composite_key_compareISt4lessISZ_ENS6_11strcmp_lessESR_SR_SR_SR_SR_SR_SR_SR_EEEENS9_INSA_INS6_7by_rootESD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_EENSM_IS7_NSF_IS7_SH_XadL_ZNS7_12root_commentEEEEESI_SR_SR_SR_SR_SR_SR_SR_SR_EESD_EENS9_INSA_INS6_9by_parentESD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_EENSM_IS7_NSF_IS7_SZ_XadL_ZNS7_13parent_authorEEEEENSF_IS7_S1G_XadL_ZNS7_15parent_permlinkEEEEESI_SR_SR_SR_SR_SR_SR_SR_EENS1J_IS1L_S1M_S1K_ISH_ESR_SR_SR_SR_SR_SR_SR_EEEESD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_SD_EENS16_IS7_S1E_EEEEE7sessionEE4undoEv ()
#4 0x000055bed80eac25 in steem::chain::database::clear_pending() ()
#5 0x000055bed812f314 in steem::chain::database::push_block(steem::protocol::signed_block const&, unsigned int) ()
#6 0x000055bed80d1c19 in steem::plugins::chain::detail::write_request_visitor::operator()(steem::protocol::signed_block const*) ()
#7 0x000055bed80d0971 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<steem::plugins::chain::detail::chain_plugin_impl::start_write_processing()::{lambda()#1}> > >::_M_run() ()
#8 0x000055bed8be6e10 in execute_native_thread_routine ()
#9 0x00007fbfdaccf182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007fbfdaa99b1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
It is not waiting on a lock.
There are 54 instances of API threads waiting on futures in broadcast_transaction
and 21 broadcast_transaction_synchronous
. We recently added a timeout on broadcast_transaction_synchronous
waiting on the futures (which might help the problem). We could add timeouts on the futures in accept_transaction
as well. The broadcast_transaction
calls should progress as soon as the write process is done pushing the current block.
The node might just be reaching capacity and beginning to slow down. If it continues happening, more stack traces would be useful.
debug for all threads.txt
A highly used, non-mira "full node" API deadlocked with the included backtrace.
The main thread is stuck as follows:
Locks should have timeouts.