skalenetwork / skaled

Running more than 20 production blockchains, SKALED is Ethereum-compatible, high performance C++ Proof-of-Stake client, tools and libraries. Uses SKALE consensus as a blockchain consensus core. Includes dynamic Oracle. Implements file storage and retrieval as an EVM extension.
https://skale.network
GNU General Public License v3.0
85 stars 40 forks source link

Establish threading rules in skaled-consensus interaction #1425

Closed dimalit closed 1 year ago

dimalit commented 1 year ago
  1. Current implementation is based on the assumption that consensus always calls pendingTransactions and createBlock from the same thread.This is not the case for very long time already.
  2. Threading rules for startAll/bootstrapAll should be defined.
  3. Threading rules for consensus and skaled termination should be defined. When these functions can be called? From which threads they can be called?
  4. In ConsensusExtFace::terminateApplication() should consensus' exitGracefully() be also called?
  5. Evaluate this statement: Consensus guarantees that it will not do anything for a particular block ID, until pendingTransactions(...) for this block id returns. (ConsensusInterface.h)
  6. Consensus should have 2 states: 'created' and 'started'. After 1st we can allow RPC. See https://github.com/skalenetwork/internal-support/issues/753

Need to decide these rules, document and follow them.

dimalit commented 1 year ago

Also see https://github.com/skalenetwork/internal-support/issues/554

dimalit commented 1 year ago

Doc https://github.com/skalenetwork/internal-support/blob/main/docs/specifications/skaled/threading.md

dimalit commented 1 year ago

Key proposed changes to consensus:

  1. Do not hang, but return from ConsensusInterface::bootstrapAll() (lower priority issue).
  2. As proposed, exitGracefully() would possibly be called at any time moment (not just at time when block is under processing by skaled - as it was before), so maybe it requires some changes in consensus code.
  3. When need to exit, consensus just need to call ConsensusExtFace::terminateApplication() and RELY ON SKALED to terminate everything nicely (check if this is implemented in this way in consensus).
  4. If exitGracefully() call arrived while block N is under processing in skaled, consensus should prevent itself from proposing block N+1.
dimalit commented 1 year ago

Intermediate testing results: consensus doesn't wait for block creation to exit. Details:

  1. 4-node chain, node 4 already exited, node 3 has consensus "on pause".
  2. Exit was requested on node 3 after creation of block 12.
  3. Soon after exit request, skaled got transaction through JSON-RPC.
  4. Consensus started working on block 13, didn't wait for its completion, and exited in 6 sec
    
    2023-04-28 20:46:41.669364   Block sealed #12 (#bbbcbc35…)
    2023-04-28 20:46:41.669468   Block stats:BN:11:BTS:1682714777:TXS:0:HDRS:11:LOGS:40:SENGS:1:TXRS:1:BLCKS:3:ACCS:69:BQS:1:BDS:44:TSS:0:UTX:0:VTX:0:CMM:17846
    2023-04-28 20:46:41.669823   Block 12 DB usage is 1104. Piece DB usage is 13957 bytes
    2023-04-28 20:46:41.669857      Imported and best 0 (#12). Has 0 siblings.
    2023-04-28 20:46:41.669989   Insterted block with 0 transactions
    2023-04-28 20:46:41.670327   Post state changed.
    2023-04-28 20:46:41.670369   noteChanged: {chain}
    2023-04-28 20:46:41.670397   Total unsafe time so far = 0 seconds
    2023-04-28 20:46:41.670424   Successfully imported 0 of 0 transactions

2023-04-28 20:46:41.670450 sent_to_consensus = 1 got_from_consensus = 1 m_transaction_cache = 0 m_tq = 0 m_bcast_counter = 1 2023-04-28 20:46:41.672053 TRACEPOINT fetch_transactions 148 2023-04-28 20:46:41.727946 http://127.0.0.1:36710 >>> {"id":34,"jsonrpc":"2.0","method":"eth_blockNumber","params":[]} 2023-04-28 20:46:41.728309 http://127.0.0.1:36710 <<< {"id":34,"jsonrpc":"2.0","result":"0xc"} 2023-04-28 20:46:41.772355 TRACEPOINT drop_bad_transactions 148 [2023-04-28 20:46:41.773] [3:main] [trace] 12:Read JSON header{"blockID":12,"nodeID":2,"schainID":1,"status":5,"substatus":3,"type":"BlckCatchupReq"} [2023-04-28 20:46:41.773] [3:main] [debug] 12:Catchups: sChain->getCommittedBlockID() <= block_id(blockID) [2023-04-28 20:46:41.773] [3:catchup] [debug] 12:Server step 2: sent catchup response header [2023-04-28 20:46:41.773] [3:catchup] [debug] 12:Server step 3: response completed: no blocks sent 2023-04-28 20:46:41.780230 http://127.0.0.1:36710 >>> {"id":35,"jsonrpc":"2.0","method":"eth_blockNumber","params":[]} 2023-04-28 20:46:41.781051 http://127.0.0.1:36710 <<< {"id":35,"jsonrpc":"2.0","result":"0xc"} 2023-04-28 20:46:41.783572 Skaled status: setExitState: StartAgain to false 2023-04-28 20:46:41.783735 THREADS timer started 2023-04-28 20:46:41.784120 Skaled status: setExitState: StartFromSnapshot to false 2023-04-28 20:46:41.784214 SKALE NETWORK BROWSER refreshing(stop) did cleared/forgot dispatch/job 2023-04-28 20:46:41.823834 TRACEPOINT receive_transaction 2 2023-04-28 20:46:41.823877 m_received = 1

2023-04-28 20:46:41.824084 Mined gas: 1

2023-04-28 20:46:41.824305 Queued vaguely legit-looking transaction #6d67ce0a… 2023-04-28 20:46:41.824378 TRACEPOINT receive_transaction_success 2 2023-04-28 20:46:41.824404 Successfully received through broadcast 6d67ce0a70b37357e3af0dcfa2529f01d3996f0178dae1cbc89861252d4e1f16 2023-04-28 20:46:41.824391 sent_to_consensus = 1 got_from_consensus = 1 m_transaction_cache = 0 m_tq = 1 m_bcast_counter = 1 2023-04-28 20:46:41.824455 TRACEPOINT sent_txn_new 2 2023-04-28 20:46:41.824473 TRACEPOINT broadcast_already_have 2 2023-04-28 20:46:41.824500 sent_to_consensus = 1 got_from_consensus = 1 m_transaction_cache = 1 m_tq = 1 m_bcast_counter = 2 2023-04-28 20:46:41.824559 TRACEPOINT sent_txn 2 2023-04-28 20:46:41.824584 Sent txn: 6d67ce0a70b37357e3af0dcfa2529f01d3996f0178dae1cbc89861252d4e1f16

2023-04-28 20:46:41.824597 1 before exitGracefully() 2023-04-28 20:46:41.824607 sent_to_consensus = 2 got_from_consensus = 1 m_transaction_cache = 1 m_tq = 1 m_bcast_counter = 2 2023-04-28 20:46:41.824637 Exit requested with signal 15 and exit code 0 2023-04-28 20:46:41.824653 TRACEPOINT send_toconsensus 2 [2023-04-28 20:46:41.824] [config] [info] Consensus engine exiting: blocking exit exitGracefully called by skaled 2023-04-28 20:46:41.840783 Skaled status: setExitState: ClearDataDir to false [2023-04-28 20:46:41.844] [3:pending] [trace] 12:Created proposal, transactions:1 [2023-04-28 20:46:41.844] [3:proposal] [trace] 12:addBlockProposal blockID=13 proposerIndex=3 [2023-04-28 20:46:41.848] [3:main] [debug] 12:PROPOSING BLOCK NUMBER:13 [2023-04-28 20:46:41.850] [3:main] [trace] 12:Adding sigshare [2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 0: Starting block proposal [2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 0: Starting block proposal [2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 1: wrote proposal header [2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 1: wrote proposal header [2023-04-28 20:46:41.850] [3:main] [trace] 12:Read JSON header{"status":3,"substatus":10,"type":"BlckPrpslRsp"} [2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 2: read proposal response [2023-04-28 20:46:41.850] [3:main] [trace] 12:Read JSON header{"status":3,"substatus":10,"type":"BlckPrpslRsp"} [2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal Server terminated proposal push:3:10 [2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 2: read proposal response [2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal Server terminated proposal push:3:10 [2023-04-28 20:46:41.886] [3:main] [trace] 12:Read JSON header{"blockID":13,"hash":"c6b29e06ebc1281109bcb6b852a6d48ba746c7e4e6fabc9dc20440a660a2a9f7","proposerIndex":1,"proposerNodeID":1,"schainID":1,"sig":"c6b29e06ebc1281109bcb6b852a6d48ba746c7e4e6fabc9dc20440a660a2a9f7","sr":"46320509353513273106582423493727320152202237096314791991810382902766530930981","status":5,"substatus":3,"timeStamp":1682714801,"timeStampMs":883,"txCount":1,"type":"BlckPrpslReq"} [2023-04-28 20:46:41.889] [3:main] [debug] 12:Calculating missing hashes [2023-04-28 20:46:41.890] [3:main] [debug] 12:Server: No missing partial hashes [2023-04-28 20:46:41.890] [3:main] [debug] 12:Storing block proposal [2023-04-28 20:46:41.890] [3:proposal] [trace] 12:addBlockProposal blockID_=13 proposerIndex=1 [2023-04-28 20:46:42.028] [3:main] [trace] 12:Read JSON header{"blockID":12,"nodeID":1,"schainID":1,"status":5,"substatus":3,"type":"BlckCatchupReq"} [2023-04-28 20:46:42.028] [3:main] [debug] 12:Catchups: sChain->getCommittedBlockID() <= block_id(blockID) [2023-04-28 20:46:42.028] [3:catchup] [debug] 12:Server step 2: sent catchup response header [2023-04-28 20:46:42.028] [3:catchup] [debug] 12:Server step 3: response completed: no blocks sent [2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 0: Starting block proposal [2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 0: Starting block proposal [2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 1: wrote proposal header [2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 1: wrote proposal header [2023-04-28 20:46:42.351] [3:main] [trace] 12:Read JSON header{"status":3,"substatus":10,"type":"BlckPrpslRsp"} [2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 2: read proposal response [2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal Server terminated proposal push:3:10 [2023-04-28 20:46:42.352] [3:main] [trace] 12:Read JSON header{"status":1,"substatus":0,"type":"BlckPrpslRsp"} [2023-04-28 20:46:42.352] [3:main] [trace] 12:Proposal step 2: read proposal response [2023-04-28 20:46:42.352] [3:main] [trace] 12:Proposal step 3: sent partial hashes [2023-04-28 20:46:42.353] [3:main] [trace] 12:Read JSON header{"count":0,"status":5,"substatus":3,"type":"MsngTxsReq"} [2023-04-28 20:46:42.353] [3:proposal] [trace] 12:Push agent processed missing transactions header [2023-04-28 20:46:42.353] [3:main] [trace] 12:Proposal complete::no missing transactions [2023-04-28 20:46:42.353] [3:main] [trace] 12:Read JSON header{"pk":"","pks":"","sig":"8cc4ad67f57c2cf7ce383926fc0691edccf75cff723d276e5e797b20e85ebefe","sss":"a740d2fa53d554c6929aefe75357c54aba4cef8232f9ede2a18e35b12debb54e","status":0,"substatus":0,"type":"SigShareRsp"} [2023-04-28 20:46:42.353] [3:main] [trace] 12:Adding sigshare [2023-04-28 20:46:42.719] [3:consensus] [debug] 12:2ROUND_COMPLETE:BLOCK:11:ROUND:6 [2023-04-28 20:46:42.719] [3:consensus] [debug] 12::PROPOSER:2:DECIDED VALUE1:ROUND:6 [2023-04-28 20:46:42.719] [3:consensus] [debug] 12:Decided value: 1 for blockid:11 proposer:2 [2023-04-28 20:46:42.719] [3:consensus] [debug] 12:Got child message 11:2 [2023-04-28 20:46:42.851] [3:main] [trace] 12:Proposal step 0: Starting block proposal [2023-04-28 20:46:42.852] [3:main] [trace] 12:Proposal step 1: wrote proposal header [2023-04-28 20:46:42.852] [3:main] [trace] 12:Read JSON header{"status":3,"substatus":10,"type":"BlckPrpslRsp"} [2023-04-28 20:46:42.852] [3:main] [trace] 12:Proposal step 2: read proposal response [2023-04-28 20:46:42.852] [3:main] [trace] 12:Proposal Server terminated proposal push:3:10 2023-04-28 20:46:42.902553 2 after exitGracefully() [2023-04-28 20:46:42.902] [config] [info] Consensus engine exiting: exitGracefullyAsync called by skaled [2023-04-28 20:46:42.902] [config] [info] exitGracefullyAsync running [2023-04-28 20:46:42.902] [config] [info] Node exit called [2023-04-28 20:46:42.902] [config] [info] Node::exit() requested [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: close all sockets called [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: server conditional vars notified [2023-04-28 20:46:42.902] [config] [trace] Notifying TCP cond93969017793424 [2023-04-28 20:46:42.902] [config] [trace] Notifying TCP cond93969017584192 [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: agent conditional vars notified [2023-04-28 20:46:42.902] [3:main] [info] 12:Consensus exiting: Node:Exit requested [2023-04-28 20:46:42.902] [3:main] [info] 12:Consensus exiting: Node:Exit requested [2023-04-28 20:46:42.902] [3:main] [info] 12:Consensus exiting: Node:Exit requested [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: block proposal socket touched [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: catchup socket touched [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: SGXZMQClient exiting [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: SGXZMQClient exited [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: closeAndCleanupAll consensus zmq sockets [2023-04-28 20:46:42.902] [config] [info] Cleaning up ZMQ sockets [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: closing ZMQ send sockets [2023-04-28 20:46:42.902] [config] [debug] main zmq debug in closeSend(): closing 140397045099856 [2023-04-28 20:46:42.902] [config] [debug] main zmq debug in closeSend(): closing 140397045122736 [2023-04-28 20:46:42.902] [config] [debug] main zmq debug in closeSend(): closing 140397045129312 [2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: closing receive sockets [2023-04-28 20:46:42.902] [config] [info] Closing ZMQ context [2023-04-28 20:46:42.903] [3:net] [debug] 12:NtwkRdLoop zmq debug: closing = 140397179327456 [2023-04-28 20:46:42.903] [config] [info] Closed ZMQ [2023-04-28 20:46:42.903] [config] [info] Node exit completed [2023-04-28 20:46:43.353] [3:main] [info] 12:Consensus exiting: IO:Exit requested [2023-04-28 20:46:43.610] [3:main] [info] 12:Exited Oracle worker thread 1 2023-04-28 20:46:43.904617 SKALE NETWORK BROWSER refreshing(stop) did cleared/forgot dispatch/job 2023-04-28 20:46:43.909974 Will stop WS server on address 127.0.0.3 and port 7003=0x00001b5b/nfo... 2023-04-28 20:46:43.910106 Will stop on port 7003=0x00001b5b... 2023-04-28 20:46:44.620251 OK, server stopped on port 7003=0x00001b5b 2023-04-28 20:46:44.620458 OK, stopped WS server on address 127.0.0.3 and port 7003=0x00001b5b/nfo 2023-04-28 20:46:44.620639 Will stop proxygen/HTTP server on address 127.0.0.3 and port 1234=0x000004d2/nfo... 2023-04-28 20:46:44.620843 OK, stopped proxygen/HTTP server on address 127.0.0.3 and port 1234=0x000004d2/nfo 2023-04-28 20:46:44.620953 Will stop proxygen/HTTP server on address 127.0.0.3 and port 1240=0x000004d8/std... 2023-04-28 20:46:44.621069 OK, stopped proxygen/HTTP server on address 127.0.0.3 and port 1240=0x000004d8/std 2023-04-28 20:46:44.622179 Skaled status: setSubsystemRunning: Rpc to false

[2023-04-28 20:46:46.373] [3:catchup] [debug] 12:Catchupc step 0: requesting blocks after 12 [2023-04-28 20:46:46.474] [3:main] [error] 12:!Exception: CatchupClientAgent:Connecting to dead node 4:Input/output error 2023-04-28 20:46:46.513963 3 after wait loop 2023-04-28 20:46:46.514116 4 before dtor 2023-04-28 20:46:46.514276 1 before exitGracefully() 2023-04-28 20:46:46.514358 Exit requested with signal 15 and exit code 0 [2023-04-28 20:46:46.514] [3:consensus] [info] 12:Consensus engine exiting: blocking exit exitGracefully called by skaled 2023-04-28 20:46:46.514436 Closing blockchain DB 2023-04-28 20:46:46.514898 Blockchain is closed 2023-04-28 20:46:46.515034 Worker stopping 4690 ms 2023-04-28 20:46:47.602574 2 after exitGracefully() 2023-04-28 20:46:47.602609 3 after wait loop [2023-04-28 20:46:47.602] [config] [info] Consensus engine exiting: exitGracefullyAsync called by skaled 2023-04-28 20:46:47.602622 4 before dtor 2023-04-28 20:46:47.602656 Closing blockchain DB 2023-04-28 20:46:47.602687 Blockchain is closed 2023-04-28 20:46:47.602702 Skaled status: setSubsystemRunning: Blockchain to false

2023-04-28 20:46:47.602889 Closing blockchain DB 2023-04-28 20:46:47.602914 Blockchain is closed 2023-04-28 20:46:47.603046 Closing blockchain DB

dimalit commented 1 year ago

related https://github.com/skalenetwork/skaled/issues/1525

dimalit commented 1 year ago

related https://github.com/skalenetwork/skaled/issues/1532

DmytroNazarenko commented 1 year ago

skaled: 3.17.0-beta.1

oleksandrSydorenkoJ commented 1 year ago

Part of https://github.com/skalenetwork/skaled/issues/1348