Open kkeaton-rcp opened 8 months ago
@matthew1001 any insight here?
Can you confirm a couple of details please?
23.10.2
for all nodes would be a useful check if it's possibleqbft_getValidatorsByBlockNumber("latest")
to some of the currently active nodes?qbft_proposeValidatorVote
calls around the time of stopping/starting the nodes?@matthew1001
Correct - 7 are validators and the rest are regular JSON/RPC nodes
I confirmed this, all 7 are showing up
Correct
I upgraded all nodes to 23.10.2 - I'm getting a bit different behavior so there is a new exception I noticed in the logs (this is when the testing runs successfully)
java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1960)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2095)
at org.hyperledger.besu.ethereum.blockcreation.txselection.BlockTransactionSelector.timeLimitedSelection(BlockTransactionSelector.java:181)
at org.hyperledger.besu.ethereum.blockcreation.txselection.BlockTransactionSelector.buildTransactionListForBlock(BlockTransactionSelector.java:162)
at org.hyperledger.besu.ethereum.blockcreation.AbstractBlockCreator.selectTransactions(AbstractBlockCreator.java:372)
at org.hyperledger.besu.ethereum.blockcreation.AbstractBlockCreator.createBlock(AbstractBlockCreator.java:206)
at org.hyperledger.besu.ethereum.blockcreation.AbstractBlockCreator.createBlock(AbstractBlockCreator.java:154)
at org.hyperledger.besu.ethereum.blockcreation.AbstractBlockCreator.createBlock(AbstractBlockCreator.java:140)
at org.hyperledger.besu.consensus.qbft.statemachine.QbftRound.createAndSendProposalMessage(QbftRound.java:130)
at org.hyperledger.besu.consensus.qbft.statemachine.QbftBlockHeightManager.handleBlockTimerExpiry(QbftBlockHeightManager.java:136)
at org.hyperledger.besu.consensus.common.bft.statemachine.BaseBftController.handleBlockTimerExpiry(BaseBftController.java:167)
at org.hyperledger.besu.consensus.common.bft.EventMultiplexer.handleBftEvent(EventMultiplexer.java:65)
at java.base/java.util.Optional.ifPresent(Optional.java:178)
at org.hyperledger.besu.consensus.common.bft.BftProcessor.run(BftProcessor.java:65)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
but I still had it fail, but I am no longer getting 2024-02-23 20:04:17.465+00:00 | BftProcessorExecutor-QBFT-0 | INFO | ProposalValidator | Invalid Proposal Payload: Latest Prepared Metadata blockhash does not align with proposed block
it is just all 2024-02-23 20:05:07.956+00:00 | EthScheduler-Timer-0 | DEBUG | WaitForPeerTask | Waiting for new peer connection. 9 peers currently connected.
Sorry for the slow reply @kkeaton-rcp . I haven't been able to look into it in detail yet, but another contributor @Brindrajsinh-Chauhan might have some time to do so. I'll assign it over to him for now.
Github does not seem to let me assign it to myself, would be grateful if @non-fungible-nelson you could take that action. Will start investigating on this piece.
Looks like I can @Brindrajsinh-Chauhan - just assigned it to you
Just wondering if there was any update on this?
Hey sorry, have not been able to recreate this based on the scenarios you provided.
Did you have a smart contract that took in some data and triggered that while the nodes were down?
yes send a bunch of transactions to a smart contract with a few nodes down and then bringing them back up
Could a large amount of data going onto a smart contract cause this error? This error has now happened without any validator nodes being down. @Brindrajsinh-Chauhan @matthew1001
Or are there any known issues that deploying a smart contract with truffle (that is now deprecated) could cause? Would love to hear others opinions on this..
@kkeaton-rcp, I suggested a solution on a similar issue #6732 .
In your case, I suspect the surviving validators are unable to produce blocks for your transactions within the set value for requesttimeoutseconds. Default recommendation for requesttimeoutseconds is 2x your blockperiodseconds, however the hardware resources of your remaining validators play a role in ensuring prompt block creation, especially, disk IO. I suggest you review your disk IO for performance. Try increasing your requesttimeoutseconds to something higher can help reduce the occurrence of this issue.
Description
I have a private blockchain with 7 validators, I take down 2 validators and the chain seems to freeze and is unable to produce blocks.
Acceptance Criteria
Steps to Reproduce (Bug)
I have tested this a few times and it is inconsistent. Sometimes the chain does not stall, other times the chain does stall.
Shut down two validators and then trigger events to happen on the blockchain.
Expected behavior: [What you expect to happen] Blockchain works with 2 validators down.
Actual behavior: [What actually happens] The blockchain will stall and gives
BftProcessorExecutor-QBFT-0 | INFO | ProposalValidator | Invalid Proposal Payload: Latest Prepared Metadata blockhash does not align with proposed block
errors along withEthScheduler-Timer-0 | DEBUG | WaitForPeerTask | Waiting for new peer connection. 9 peers currently connected.
Frequency: [What percentage of the time does it occur?] This happens about half the times I have run testing.
Logs (if a bug)
Versions (Add all that apply)
besu --version
]java -version
]cat /etc/*release
]uname -a
]