hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.5k stars 824 forks source link

Forest mode failing to sync with snapsync #5807

Open matkt opened 1 year ago

matkt commented 1 year ago

Description

Sometimes we can have a rocksdb timeout issue during the snapsync with forest

"Unexpected exception in pipeline. Aborting.","throwable":" org.hyperledger.besu.plugin.services.exception.StorageException: org.rocksdb.RocksDBException: TimedOut(LockTimeout)\n\tat org.hyperledger.besu.plugin.services.storage.rocksdb.RocksDBTransaction.put(RocksDBTransaction.java:69)\n\tat org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageTransactionValidatorDecorator.put(SegmentedKeyValueStorageTransactionValidatorDecorator.java:50)\n\tat org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageAdapter$KeyValueStorageTransactionAdapter.put(SegmentedKeyValueStorageAdapter.java:148)\n\tat org.hyperledger.besu.ethereum.storage.keyvalue.WorldStateKeyValueStorage$Updater.putAccountStorageTrieNode(WorldStateKeyValueStorage.java:219)\n\tat org.hyperledger.besu.ethereum.storage.keyvalue.WorldStateKeyValueStorage$Updater.putAccountStorageTrieNode(WorldStateKeyValueStorage.java:157)\n\tat org.hyperledger.besu.ethereum.eth.sync.snapsync.request.StorageRangeDataRequest.lambda$doPersist$0(StorageRangeDataRequest.java:99)\n\tat org.hyperledger.besu.ethereum.trie.CommitVisitor.maybeStoreNode(CommitVisitor.java:77)\n\tat org.hyperledger.besu.ethereum.eth.sync.snapsync.StackTrie$1.maybeStoreNode(StackTrie.java:138)\n\tat org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:68)\n\tat org.hyperledger.besu.ethereum.trie.patricia.LeafNode.accept(LeafNode.java:85)\n\tat org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:55)\n\tat org.hyperledger.besu.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95)\n\tat org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:39)\n\tat org.hyperledger.besu.ethereum.trie.patricia.ExtensionNode.accept(ExtensionNode.java:84)\n\tat org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:55)\n\tat org.hyperledger.besu.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95)\n\tat org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:55)\n\tat org.hyperledger.besu.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95)\n\tat org.hyperledger.besu.ethereum.trie.StoredMerkleTrie.commit(StoredMerkleTrie.java:149)\n\tat org.hyperledger.besu.ethereum.eth.sync.snapsync.StackTrie.commit(StackTrie.java:132)\n\tat org.hyperledger.besu.ethereum.eth.sync.snapsync.request.StorageRangeDataRequest.doPersist(StorageRangeDataRequest.java:112)\n\tat org.hyperledger.besu.ethereum.eth.sync.snapsync.request.SnapDataRequest.persist(SnapDataRequest.java:123)\n\tat org.hyperledger.besu.ethereum.eth.sync.snapsync.PersistDataStep.persist(PersistDataStep.java:65)\n\tat org.hyperledger.besu.ethereum.eth.sync.snapsync.SnapWorldStateDownloadProcess$Builder.lambda$build$8(SnapWorldStateDownloadProcess.java:285)\n\tat org.hyperledger.besu.services.pipeline.MapProcessor.processNextInput(MapProcessor.java:31)\n\tat org.hyperledger.besu.services.pipeline.ProcessingStage.run(ProcessingStage.java:38)\n\tat org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:169)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\nCaused by: org.rocksdb.RocksDBException: TimedOut(LockTimeout)\n\tat org.rocksdb.Transaction.put(Native Method)\n\tat org.rocksdb.Transaction.put(Transaction.java:857)\n\tat org.hyperledger.besu.plugin.services.storage.rocksdb.RocksDBTransaction.put(RocksDBTransaction.java:63)\n\t... 31 more\n"}

Acceptance Criteria

Steps to Reproduce (Bug)

Start a new besu node with the forest mode and checkpoint sync; After sometimes the sync is failing with this stacktrace

Versions (Add all that apply)

Configuration:
Network: Mainnet
Network Id: 1
Data storage: Forest
Sync mode: Snap
RPC HTTP APIs: TRACE,ADMIN,DEBUG,NET,ETH,WEB3,TXPOOL
RPC HTTP port: 8545
Engine APIs: ENGINE,ETH
Engine port: 8551
Engine JWT: /etc/jwt-secret.hex

Host:
Java: openjdk-java-19
Maximum heap size: 5.00 GB
OS: linux-x86_64
glibc: 2.35
jemalloc: 5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756 Total memory: 15.62 GB
CPU cores: 4

(besu-prysm-mainnet-nightly-forest-snap)

non-fungible-nelson commented 1 year ago

@garyschulte let's tag and tie this to a retry mechanism PR for the sync rocksDB busy exceptions. Thanks!

non-fungible-nelson commented 1 year ago

Sometimes occurs on Bonsai.