skalenetwork / skaled

Running more than 20 production blockchains, SKALED is Ethereum-compatible, high performance C++ Proof-of-Stake client, tools and libraries. Uses SKALE consensus as a blockchain consensus core. Includes dynamic Oracle. Implements file storage and retrieval as an EVM extension.
https://skale.network
GNU General Public License v3.0
84 stars 39 forks source link

Fix pending transaction analysis part of skaled #2018

Open sergiy-skalelabs opened 2 years ago

sergiy-skalelabs commented 2 years ago

The pending transaction analysis subsystem of skaled requires serious review and improvements because it's reported as problematic by hellgrid and drd tools of valgrind. Everything is around SkaleHost class method invoked through consensus ext face code as callbacks from skaled consensus code. Everything is around SkaleHost::m_consensusWorkingMutex and SkaleHost::m_pending_createMutex mutexes. This is preferable solution. Possible solution is making the SkaleHost::m_consensusWorkingMutex and SkaleHost::m_pending_createMutex mutexes used with std::lock_guard/std::unique_lock only, and remove manual mutex::lock() invocations for these mutexes. Another possible solution is opposite. We can remove std::lock_guard/std::unique_lock usage for these mutexes and always invoke mutex::lock() and mutex::unlock() manually. In this case we must manually ensure mutex::unlock() is invoked correctly and only when mutex is locked. This is not preferable solution. In any case we need to remove unlocker classes used with these mutexes. This task appear because valgrind reporting mutex unlocking in non-locked state what leads to using data objects from different threads as data races and may crash memory. Example of call stack 1:

==42443==  Lock at 0x17C3EEB8 was first observed
==42443==    at 0x4D9ACCF: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==42443==    by 0x1FB9546: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:749)
==42443==    by 0x1FC70D9: std::timed_mutex::lock() (mutex:191)
==42443==    by 0x1FBDA3F: SkaleHost::pendingTransactions(unsigned long, boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>&) (SkaleHost.cpp:365)
==42443==    by 0x1FBBB06: ConsensusExtImpl::pendingTransactions(unsigned long, boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>&) (SkaleHost.cpp:229)
==42443==    by 0x2671863: PendingTransactionsAgent::createTransactionsListForProposal() (PendingTransactionsAgent.cpp:120)
==42443==    by 0x2670974: PendingTransactionsAgent::buildBlockProposal(block_id, TimeStamp&) (PendingTransactionsAgent.cpp:65)
==42443==    by 0x24B1708: Schain::proposeNextBlock() (Schain.cpp:484)
==42443==    by 0x24BE286: Schain::bootstrap(block_id, unsigned long, unsigned long) (Schain.cpp:951)
==42443==    by 0x23A6D8D: ConsensusEngine::bootStrapAll() (ConsensusEngine.cpp:559)
==42443==    by 0x1FC2D8E: SkaleHost::startWorking()::{lambda()#1}::operator()() const (SkaleHost.cpp:755)

Example of call stack 2:

==42443==  Lock at 0x17C3EEB8 was first observed
==42443==    at 0x4D9ACCF: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==42443==    by 0x1FB9546: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:749)
==42443==    by 0x1FC70D9: std::timed_mutex::lock() (mutex:191)
==42443==    by 0x1FBDA3F: SkaleHost::pendingTransactions(unsigned long, boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>&) (SkaleHost.cpp:365)
==42443==    by 0x1FBBB06: ConsensusExtImpl::pendingTransactions(unsigned long, boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>&) (SkaleHost.cpp:229)
==42443==    by 0x2671863: PendingTransactionsAgent::createTransactionsListForProposal() (PendingTransactionsAgent.cpp:120)
==42443==    by 0x2670974: PendingTransactionsAgent::buildBlockProposal(block_id, TimeStamp&) (PendingTransactionsAgent.cpp:65)
==42443==    by 0x24B1708: Schain::proposeNextBlock() (Schain.cpp:484)
==42443==    by 0x24BE286: Schain::bootstrap(block_id, unsigned long, unsigned long) (Schain.cpp:951)
==42443==    by 0x23A6D8D: ConsensusEngine::bootStrapAll() (ConsensusEngine.cpp:559)
==42443==    by 0x1FC2D8E: SkaleHost::startWorking()::{lambda()#1}::operator()() const (SkaleHost.cpp:755)
dimalit commented 2 years ago

Well-known issue