hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.52k stars 842 forks source link

Fresh snap sync did not start worldstate heal #5549

Open yorickdowne opened 1 year ago

yorickdowne commented 1 year ago

Description

A fresh sync of mainnet Ethereum on Besu using snap sync got through FastImportBlocksStep, but did not start worldstate heal. It was "stuck" here.

Steps to Reproduce (Bug)

Logs (if a bug)

eth-main-execution-1  | 2023-06-06 23:45:55.873+00:00 | EthScheduler-Services-73 (importBlock) | INFO  | FastImportBlocksStep | Block import progress: 17407253 of 17423064 (99%)
eth-main-execution-1  | 2023-06-06 23:46:27.843+00:00 | EthScheduler-Services-73 (importBlock) | INFO  | FastImportBlocksStep | Block import progress: 17408453 of 17423064 (99%)
eth-main-execution-1  | 2023-06-06 23:48:34.674+00:00 | EthScheduler-Services-78 (importBlock) | INFO  | FastImportBlocksStep | Block import progress: 17415357 of 17423064 (99%)
eth-main-execution-1  | 2023-06-07 01:40:37.386+00:00 | Timer-0 | INFO  | DNSResolver | Resolved 2409 nodes
eth-main-execution-1  | 2023-06-07 07:40:34.310+00:00 | Timer-0 | INFO  | DNSResolver | Resolved 2409 nodes
eth-main-execution-1  | 2023-06-07 13:30:44.083+00:00 | Timer-0 | INFO  | DNSResolver | Resolved 2409 nodes
eth-main-execution-1  | 2023-06-07 19:20:40.366+00:00 | Timer-0 | INFO  | DNSResolver | Resolved 2409 nodes
eth-main-execution-1  | 2023-06-08 01:10:29.142+00:00 | Timer-0 | INFO  | DNSResolver | Resolved 2409 nodes
eth-main-execution-1  | 2023-06-08 07:00:31.256+00:00 | Timer-0 | INFO  | DNSResolver | Resolved 2409 nodes
eth-main-execution-1  | 2023-06-08 12:10:36.405+00:00 | Timer-0 | INFO  | DNSResolver | Resolved 2409 nodes

Node restarted at this point

eth-main-execution-1  | 2023-06-08 13:59:56.533+00:00 | main | INFO  | DefaultP2PNetwork | Node address 0xcbe24acf964d074e77a3e604410094a053702dfe
eth-main-execution-1  | 2023-06-08 13:59:56.536+00:00 | main | INFO  | NetworkRunner | Supported capabilities: [eth/63, eth/64, eth/65, eth/66, eth/67, eth/68], [snap/1]
eth-main-execution-1  | 2023-06-08 13:59:56.536+00:00 | main | INFO  | DefaultSynchronizer | Starting synchronizer.
eth-main-execution-1  | 2023-06-08 13:59:56.537+00:00 | main | INFO  | FastSyncDownloader | Starting sync
eth-main-execution-1  | 2023-06-08 13:59:56.540+00:00 | main | INFO  | SnapWorldStateDownloader | Downloading world state from peers for pivot block 17413930 (0x3b875b5934da29843acb539b2abbf98219eb8bd0d84cb7ebfc0ac5398172189c). State root 0xef1e81850d7759bd982c36eebc37b40073c37b4fa8aa2b9bd272f8fd3a98489b pending requests 0
eth-main-execution-1  | 2023-06-08 13:59:56.736+00:00 | main | INFO  | FastSyncTargetManager | Unable to find sync target. Currently checking 0 peers for usefulness.
eth-main-execution-1  | 2023-06-08 13:59:56.742+00:00 | main | INFO  | Runner | Ethereum main loop is up.
eth-main-execution-1  | 2023-06-08 14:00:12.213+00:00 | nioEventLoopGroup-3-6 | INFO  | SnapWorldDownloadState | Running world state heal process from peers with pivot block 17435984
eth-main-execution-1  | 2023-06-08 14:00:20.569+00:00 | Timer-0 | INFO  | DNSResolver | Resolved 2409 nodes
eth-main-execution-1  | 2023-06-08 14:00:56.545+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 11427 world state nodes
eth-main-execution-1  | 2023-06-08 14:01:56.579+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 335269 world state nodes
eth-main-execution-1  | 2023-06-08 14:01:59.349+00:00 | EthScheduler-Services-34 (importBlock) | INFO  | FastImportBlocksStep | Block import progress: 17428304 of 17435984 (99%)
eth-main-execution-1  | 2023-06-08 14:02:56.579+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 630472 world state nodes
eth-main-execution-1  | 2023-06-08 14:03:01.838+00:00 | EthScheduler-Services-34 (importBlock) | INFO  | FastImportBlocksStep | Block import progress: 17433904 of 17435984 (99%)
eth-main-execution-1  | 2023-06-08 14:03:56.579+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 936610 world state nodes
eth-main-execution-1  | 2023-06-08 14:04:56.677+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 1334574 world state nodes
eth-main-execution-1  | 2023-06-08 14:05:56.683+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 1809978 world state nodes
eth-main-execution-1  | 2023-06-08 14:06:56.703+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 2240408 world state nodes
eth-main-execution-1  | 2023-06-08 14:07:56.703+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 2705233 world state nodes
eth-main-execution-1  | 2023-06-08 14:08:56.707+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 3133786 world state nodes
eth-main-execution-1  | 2023-06-08 14:09:56.729+00:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO  | SnapsyncMetricsManager | Healed 3518179 world state nodes

Versions (Add all that apply)

jframe commented 1 year ago

This could be related to #5191

siladu commented 1 year ago

Had a similar issue doing a fresh sepolia sync.

Turned on debug after it halted and this error appears a lot. @pinges or @matkt do you think this could be related to sync halting at 99% and/or heal stage not starting...or is it just noise?

Full log: besu-sync-halt.log

{"@timestamp":"2023-06-18T21:22:22,637","level":"DEBUG","thread":"nioEventLoopGroup-3-2","class":"Subscribers","message":"Error in callback: {}",
"throwable":" com.google.common.cache.CacheLoader$InvalidCacheLoadException: 
CacheLoader returned null for key [Connection with hashCode -505516889 with peer 0x592b71dbfe9c9df0524b709ad5deb2ca5eb7a217984d4512484d7c2af71ee6c9739563f2504afbc48403902234bd1f7c5fcb0290a6a14c769822c02c8062a7b6 inboundInitiated true initAt 1687123342597].
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2319)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2283)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
at com.google.common.cache.LocalCache.get(LocalCache.java:3966)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
at org.hyperledger.besu.ethereum.eth.manager.EthPeers.peer(EthPeers.java:223)
at org.hyperledger.besu.ethereum.eth.manager.EthProtocolManager.processMessage(EthProtocolManager.java:278)
at org.hyperledger.besu.ethereum.p2p.network.NetworkRunner.lambda$setupHandlers$1(NetworkRunner.java:142)
at org.hyperledger.besu.ethereum.p2p.rlpx.connections.PeerConnectionEvents.lambda$dispatchMessage$1(PeerConnectionEvents.java:64)
at org.hyperledger.besu.util.Subscribers.lambda$forEach$0(Subscribers.java:131)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.hyperledger.besu.util.Subscribers.forEach(Subscribers.java:128)
at org.hyperledger.besu.ethereum.p2p.rlpx.connections.PeerConnectionEvents.dispatchMessage(PeerConnectionEvents.java:64)
at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.ApiHandler.channelRead0(ApiHandler.java:112)
at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.ApiHandler.channelRead0(ApiHandler.java:35)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)\n"}
# Besu version 23.4.3-dev-b790ff19                                                                 #
#                                                                                                  #
# Configuration:                                                                                   #
# Network: Sepolia                                                                                 #
# Network Id: 11155111                                                                             #
# Data storage: Bonsai                                                                             #
# Sync mode: Snap                                                                                  #
# RPC HTTP APIs: ADMIN,DEBUG,NET,ETH,MINER,WEB3                                                    #
# RPC HTTP port: 8545                                                                              #
# Engine APIs: ENGINE,ETH                                                                          #
# Engine port: 8551                                                                                #
# Engine JWT: /etc/jwt-secret.hex                                                                  #
#                                                                                                  #
# Host:                                                                                            #
# Java: openjdk-java-17                                                                            #
# Maximum heap size: 3.84 GB                                                                       #
# OS: linux-x86_64                                                                                 #
# glibc: 2.35                                                                                      #
# jemalloc: 5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756                                      #
# Total memory: 15.34 GB                                                                           #
# CPU cores: 4                                                                                     #
siladu commented 1 year ago

@matkt correction to the above comment. I found an ERROR in the logs, think the DEBUG error I posted above is a red herring.

{"@timestamp":"2023-06-18T02:30:34,650","level":"ERROR","thread":"EthScheduler-Services-42 (importBlock)","class":"PipelineChainDownloader","message":"Chain download failed. Restarting after short delay.","throwable":" java.util.concurrent.CompletionException: java.lang.NullPointerException: Cannot invoke \"org.hyperledger.besu.ethereum.bonsai.storage.flat.FlatDbReaderStrategy.resetOnResync(org.hyperledger.besu.plugin.services.storage.KeyValueStorage, org.hyperledger.besu.plugin.services.storage.KeyValueStorage)\" because the return value of \"org.hyperledger.besu.ethereum.bonsai.storage.BonsaiWorldStateKeyValueStorage.getFlatDbReaderStrategy()\" is null
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1097)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at org.hyperledger.besu.services.pipeline.Pipeline.abort(Pipeline.java:195)
at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:174)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)\nCaused by: java.lang.NullPointerException: Cannot invoke \"org.hyperledger.besu.ethereum.bonsai.storage.flat.FlatDbReaderStrategy.resetOnResync(org.hyperledger.besu.plugin.services.storage.KeyValueStorage, org.hyperledger.besu.plugin.services.storage.KeyValueStorage)\" because the return value of \"org.hyperledger.besu.ethereum.bonsai.storage.BonsaiWorldStateKeyValueStorage.getFlatDbReaderStrategy()\" is null
at org.hyperledger.besu.ethereum.bonsai.storage.BonsaiWorldStateKeyValueStorage.clearFlatDatabase(BonsaiWorldStateKeyValueStorage.java:306)
at org.hyperledger.besu.ethereum.eth.sync.snapsync.SnapWorldDownloadState.reloadTrieHeal(SnapWorldDownloadState.java:220)
at org.hyperledger.besu.ethereum.eth.sync.snapsync.SnapWorldDownloadState.lambda$getBlockAddedListener$10(SnapWorldDownloadState.java:405)
at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.lambda$appendBlockHelper$9(DefaultBlockchain.java:336)
at org.hyperledger.besu.util.Subscribers.lambda$forEach$0(Subscribers.java:131)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.hyperledger.besu.util.Subscribers.forEach(Subscribers.java:128)
at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.appendBlockHelper(DefaultBlockchain.java:336)
at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.appendBlock(DefaultBlockchain.java:286)
at org.hyperledger.besu.ethereum.mainnet.MainnetBlockImporter.fastImportBlock(MainnetBlockImporter.java:69)
at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.importBlock(FastImportBlocksStep.java:120)
at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.accept(FastImportBlocksStep.java:68)
at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.accept(FastImportBlocksStep.java:36)
at org.hyperledger.besu.services.pipeline.CompleterStage.run(CompleterStage.java:37)
at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:167)\n\t... 5 more\n"}

@pinges had the same error on a mainnet node too, which resulted in a 99% sync halt.

siladu commented 1 year ago

Also, my issue may be related to the recent flat db commit (since it's calling getFlatDbReaderStrategy even without the feature enabled) which could be different to Yorick's original issue which was using 23.4.1

siladu commented 1 year ago

@yorickdowne on the off chance you still have the complete logs, can you share them please? I'm looking for an error that may have occurred sometime between the worldstate download completing and the block import completing

siladu commented 1 year ago

Happened again on another mainnet fresh sync...

{"@timestamp":"2023-06-18T09:47:03,509","level":"ERROR","thread":"EthScheduler-Services-38 (importBlock)","class":"PipelineChainDownloader","message":"Chain download failed. Restarting after short delay.","throwable":" 
java.util.concurrent.CompletionException: java.lang.NullPointerException: 
Cannot invoke \"org.hyperledger.besu.ethereum.bonsai.storage.flat.FlatDbReaderStrategy.resetOnResync(org.hyperledger.besu.plugin.services.storage.KeyValueStorage, org.hyperledger.besu.plugin.services.storage.KeyValueStorage)\" 
because the return value of \"org.hyperledger.besu.ethereum.bonsai.storage.BonsaiWorldStateKeyValueStorage.getFlatDbReaderStrategy()\" is null
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1097)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at org.hyperledger.besu.services.pipeline.Pipeline.abort(Pipeline.java:195)
at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:174)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)\nCaused by: java.lang.NullPointerException: Cannot invoke \"org.hyperledger.besu.ethereum.bonsai.storage.flat.FlatDbReaderStrategy.resetOnResync(org.hyperledger.besu.plugin.services.storage.KeyValueStorage, org.hyperledger.besu.plugin.services.storage.KeyValueStorage)\" because the return value of \"org.hyperledger.besu.ethereum.bonsai.storage.BonsaiWorldStateKeyValueStorage.getFlatDbReaderStrategy()\" is null
at org.hyperledger.besu.ethereum.bonsai.storage.BonsaiWorldStateKeyValueStorage.clearFlatDatabase(BonsaiWorldStateKeyValueStorage.java:306)
at org.hyperledger.besu.ethereum.eth.sync.snapsync.SnapWorldDownloadState.reloadTrieHeal(SnapWorldDownloadState.java:220)
at org.hyperledger.besu.ethereum.eth.sync.snapsync.SnapWorldDownloadState.lambda$getBlockAddedListener$10(SnapWorldDownloadState.java:405)
at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.lambda$appendBlockHelper$9(DefaultBlockchain.java:336)
at org.hyperledger.besu.util.Subscribers.lambda$forEach$0(Subscribers.java:131)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.hyperledger.besu.util.Subscribers.forEach(Subscribers.java:128)
at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.appendBlockHelper(DefaultBlockchain.java:336)
at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.appendBlock(DefaultBlockchain.java:286)
at org.hyperledger.besu.ethereum.mainnet.MainnetBlockImporter.fastImportBlock(MainnetBlockImporter.java:69)
at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.importBlock(FastImportBlocksStep.java:120)
at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.accept(FastImportBlocksStep.java:68)
at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.accept(FastImportBlocksStep.java:36)
at org.hyperledger.besu.services.pipeline.CompleterStage.run(CompleterStage.java:37)
at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:167)\n\t... 5 more\n"}
# Besu version 23.4.3-dev-50ea7268                                                                 #
#                                                                                                  #
# Configuration:                                                                                   #
# Network: Mainnet                                                                                 #
# Network Id: 1                                                                                    #
# Data storage: Bonsai                                                                             #
# Sync mode: Snap                                                                                  #
# RPC HTTP APIs: ADMIN,DEBUG,NET,ETH,MINER,WEB3                                                    #
# RPC HTTP port: 8545                                                                              #
# Engine APIs: ENGINE,ETH                                                                          #
# Engine port: 8551                                                                                #
# Engine JWT: /etc/jwt-secret.hex                                                                  #
#                                                                                                  #
# Host:                                                                                            #
# Java: openjdk-java-17                                                                            #
# Maximum heap size: 3.82 GB                                                                       #
# OS: linux-x86_64                                                                                 #
# glibc: 2.35                                                                                      #
# jemalloc: 5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756                                      #
# Total memory: 15.28 GB                                                                           #
# CPU cores: 4                                                                                     #

In both my cases, restarting fixed it

yorickdowne commented 1 year ago

on the off chance you still have the complete logs

I do not, sorry

matkt commented 1 year ago

nullpointer issue should now be fixed on main branch

siladu commented 1 year ago

nullpointer issue should now be fixed on main branch

Awesome. I'll leave this story open since there must be some other error causing the halt, likely still the Snapsync thread dying due to an unhandled exception.

Other than waiting for it to happen again and being able to collect DEBUG logs, we could progress this by hunting for unhandled exceptions in the code. Another less desirable option is to raise the snapsync exceptions to ERROR level so we're more likely to see the problem next time it happens (that might make it too noisy though).

anovalunosis commented 1 year ago

OS: Ubuntu 22.04.3 LTS Kernel: 5.19.0-46-generic CL client: Teku 23.8.0 (docker image pulled from docker hub) EL client: Besu 23.4.4 (docker image pulled from docker hub) Docker version 24.0.5

Dealing with the same logs as original post so I turned on DEBUG level logs:

teku-besu_node-1 | 2023-08-13T11:13:34.893753385Z org.hyperledger.besu.ethereum.p2p.rlpx.handshake.HandshakeException: Unable to create ECDH Key agreement due to Crypto engine failure teku-besu_node-1 | 2023-08-13T11:13:34.893757259Z at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.ECIESHandshaker.handleMessage(ECIESHandshaker.java:214) teku-besu_node-1 | 2023-08-13T11:13:34.893760256Z at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.HandshakeHandlerInbound.nextHandshakeMessage(HandshakeHandlerInbound.java:61) teku-besu_node-1 | 2023-08-13T11:13:34.893762798Z at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.AbstractHandshakeHandler.channelRead0(AbstractHandshakeHandler.java:95) teku-besu_node-1 | 2023-08-13T11:13:34.893765262Z at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.AbstractHandshakeHandler.channelRead0(AbstractHandshakeHandler.java:44) teku-besu_node-1 | 2023-08-13T11:13:34.893767693Z at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) teku-besu_node-1 | 2023-08-13T11:13:34.893770082Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) teku-besu_node-1 | 2023-08-13T11:13:34.893772435Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) teku-besu_node-1 | 2023-08-13T11:13:34.893775296Z at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) teku-besu_node-1 | 2023-08-13T11:13:34.893777770Z at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) teku-besu_node-1 | 2023-08-13T11:13:34.893780311Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) teku-besu_node-1 | 2023-08-13T11:13:34.893782653Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) teku-besu_node-1 | 2023-08-13T11:13:34.893785066Z at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) teku-besu_node-1 | 2023-08-13T11:13:34.893787523Z at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) teku-besu_node-1 | 2023-08-13T11:13:34.893789981Z at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) teku-besu_node-1 | 2023-08-13T11:13:34.893792280Z at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) teku-besu_node-1 | 2023-08-13T11:13:34.893794580Z at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) teku-besu_node-1 | 2023-08-13T11:13:34.893799649Z at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) teku-besu_node-1 | 2023-08-13T11:13:34.893802622Z at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) teku-besu_node-1 | 2023-08-13T11:13:34.893805193Z at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) teku-besu_node-1 | 2023-08-13T11:13:34.893817285Z at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) teku-besu_node-1 | 2023-08-13T11:13:34.893820404Z at java.base/java.lang.Thread.run(Thread.java:833) teku-besu_node-1 | 2023-08-13T11:13:34.893822779Z Caused by: org.hyperledger.besu.plugin.services.securitymodule.SecurityModuleException: Unexpected error while calculating ECDH Key Agreement: Invalid point coordinates teku-besu_node-1 | 2023-08-13T11:13:34.893825224Z at org.hyperledger.besu.cryptoservices.KeyPairSecurityModule.calculateECDHKeyAgreement(KeyPairSecurityModule.java:88) teku-besu_node-1 | 2023-08-13T11:13:34.893827552Z at org.hyperledger.besu.cryptoservices.NodeKey.calculateECDHKeyAgreement(NodeKey.java:72) teku-besu_node-1 | 2023-08-13T11:13:34.893829736Z at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.ECIESEncryptionEngine.forDecryption(ECIESEncryptionEngine.java:102) teku-besu_node-1 | 2023-08-13T11:13:34.893831947Z at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.EncryptedMessage.decryptMsg(EncryptedMessage.java:68) teku-besu_node-1 | 2023-08-13T11:13:34.893834121Z at org.hyperledger.besu.ethereum.p2p.rlpx.handshake.ecies.ECIESHandshaker.handleMessage(ECIESHandshaker.java:206) teku-besu_node-1 | 2023-08-13T11:13:34.893836409Z ... 20 more teku-besu_node-1 | 2023-08-13T11:13:34.893838692Z Caused by: java.lang.IllegalArgumentException: Invalid point coordinates teku-besu_node-1 | 2023-08-13T11:13:34.893840729Z at org.bouncycastle.math.ec.ECCurve.validatePoint(ECCurve.java:125) teku-besu_node-1 | 2023-08-13T11:13:34.893842807Z at org.bouncycastle.math.ec.ECCurve.decodePoint(ECCurve.java:420) teku-besu_node-1 | 2023-08-13T11:13:34.893845129Z at org.hyperledger.besu.crypto.SECPPublicKey.asEcPoint(SECPPublicKey.java:122) teku-besu_node-1 | 2023-08-13T11:13:34.893847269Z at org.hyperledger.besu.crypto.AbstractSECP256.publicKeyAsEcPoint(AbstractSECP256.java:184) teku-besu_node-1 | 2023-08-13T11:13:34.893849452Z at org.hyperledger.besu.cryptoservices.NodeKey.lambda$calculateECDHKeyAgreement$0(NodeKey.java:73) teku-besu_node-1 | 2023-08-13T11:13:34.893851519Z at org.hyperledger.besu.cryptoservices.KeyPairSecurityModule.calculateECDHKeyAgreement(KeyPairSecurityModule.java:82) teku-besu_node-1 | 2023-08-13T11:13:34.893853703Z ... 24 more

anovalunosis commented 1 year ago

teku-besu_node-1 | 2023-08-14T15:04:23.002041466Z com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key [Connection with hashCode -1647236115 with peer 0x210643148ea56b6465bd32995d389aaeb6ca1362456c6abfa4bbc307eaa2d9107f34c1309911d6877d5ad77f618487d94c131198eaf4121eee009f78c2b21751 inboundInitiated true initAt 1692025462893]. teku-besu_node-1 | 2023-08-14T15:04:23.002052138Z at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2319) teku-besu_node-1 | 2023-08-14T15:04:23.002059601Z at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2283) teku-besu_node-1 | 2023-08-14T15:04:23.002065950Z at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159) teku-besu_node-1 | 2023-08-14T15:04:23.002072358Z at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049) teku-besu_node-1 | 2023-08-14T15:04:23.002079117Z at com.google.common.cache.LocalCache.get(LocalCache.java:3966) teku-besu_node-1 | 2023-08-14T15:04:23.002085850Z at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863) teku-besu_node-1 | 2023-08-14T15:04:23.002091913Z at org.hyperledger.besu.ethereum.eth.manager.EthPeers.peer(EthPeers.java:223) teku-besu_node-1 | 2023-08-14T15:04:23.002097826Z at org.hyperledger.besu.ethereum.eth.manager.EthProtocolManager.processMessage(EthProtocolManager.java:278) teku-besu_node-1 | 2023-08-14T15:04:23.002103905Z at org.hyperledger.besu.ethereum.p2p.network.NetworkRunner.lambda$setupHandlers$1(NetworkRunner.java:142) teku-besu_node-1 | 2023-08-14T15:04:23.002110174Z at org.hyperledger.besu.ethereum.p2p.rlpx.connections.PeerConnectionEvents.lambda$dispatchMessage$1(PeerConnectionEvents.java:64) teku-besu_node-1 | 2023-08-14T15:04:23.002116692Z at org.hyperledger.besu.util.Subscribers.lambda$forEach$0(Subscribers.java:131) teku-besu_node-1 | 2023-08-14T15:04:23.002122966Z at java.base/java.lang.Iterable.forEach(Iterable.java:75) teku-besu_node-1 | 2023-08-14T15:04:23.002129312Z at org.hyperledger.besu.util.Subscribers.forEach(Subscribers.java:128) teku-besu_node-1 | 2023-08-14T15:04:23.002159873Z at org.hyperledger.besu.ethereum.p2p.rlpx.connections.PeerConnectionEvents.dispatchMessage(PeerConnectionEvents.java:64) teku-besu_node-1 | 2023-08-14T15:04:23.002167906Z at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.ApiHandler.channelRead0(ApiHandler.java:112) teku-besu_node-1 | 2023-08-14T15:04:23.002174311Z at org.hyperledger.besu.ethereum.p2p.rlpx.connections.netty.ApiHandler.channelRead0(ApiHandler.java:35) teku-besu_node-1 | 2023-08-14T15:04:23.002180950Z at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) teku-besu_node-1 | 2023-08-14T15:04:23.002188876Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) teku-besu_node-1 | 2023-08-14T15:04:23.002195528Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) teku-besu_node-1 | 2023-08-14T15:04:23.002201695Z at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) teku-besu_node-1 | 2023-08-14T15:04:23.002207896Z at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) teku-besu_node-1 | 2023-08-14T15:04:23.002213832Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) teku-besu_node-1 | 2023-08-14T15:04:23.002219812Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) teku-besu_node-1 | 2023-08-14T15:04:23.002225725Z at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) teku-besu_node-1 | 2023-08-14T15:04:23.002231846Z at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) teku-besu_node-1 | 2023-08-14T15:04:23.002237876Z at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) teku-besu_node-1 | 2023-08-14T15:04:23.002244117Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) teku-besu_node-1 | 2023-08-14T15:04:23.002250076Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) teku-besu_node-1 | 2023-08-14T15:04:23.002256423Z at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) teku-besu_node-1 | 2023-08-14T15:04:23.002262792Z at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) teku-besu_node-1 | 2023-08-14T15:04:23.002268831Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) teku-besu_node-1 | 2023-08-14T15:04:23.002274994Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) teku-besu_node-1 | 2023-08-14T15:04:23.002281226Z at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) teku-besu_node-1 | 2023-08-14T15:04:23.002287373Z at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) teku-besu_node-1 | 2023-08-14T15:04:23.002293523Z at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) teku-besu_node-1 | 2023-08-14T15:04:23.002300460Z at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) teku-besu_node-1 | 2023-08-14T15:04:23.002313681Z at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) teku-besu_node-1 | 2023-08-14T15:04:23.002321019Z at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) teku-besu_node-1 | 2023-08-14T15:04:23.002328774Z at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) teku-besu_node-1 | 2023-08-14T15:04:23.002335836Z at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) teku-besu_node-1 | 2023-08-14T15:04:23.002345129Z at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) teku-besu_node-1 | 2023-08-14T15:04:23.002352323Z at java.base/java.lang.Thread.run(Thread.java:833)

anovalunosis commented 1 year ago

Any updates on this ? @yorickdowne @siladu

matkt commented 1 year ago

Hi, thanks for sharing we are working on a PR to fix some snapsync issues . I think it can help you . Feel free to test it and give some feedback.

anovalunosis commented 1 year ago

Now getting this error instead:

Error:

2023-09-29T16:59:12.295244858Z 2023-09-29 16:59:12.295+00:00 | EthScheduler-Services-51 (importBlock) | INFO | FastImportBlocksStep | Block import progress: 18233689 of 18235363 (99%) teku-besu_node-1 | 2023-09-29T18:07:00.599103686Z 2023-09-29 18:07:00.598+00:00 | vert.x-eventloop-thread-1 | WARN | DNSResolver | DNS query error with 5O3IWRZYLJBCGCSL7773RE.all.mainnet.ethdisco.net teku-besu_node-1 | 2023-09-29T18:07:00.599135208Z io.vertx.core.dns.DnsException: DNS query error occurred: NXDOMAIN: type 3, name error teku-besu_node-1 | 2023-09-29T18:07:00.756856538Z 2023-09-29 18:07:00.756+00:00 | vert.x-eventloop-thread-1 | WARN | DNSResolver | DNS query error with QDJFGHGXEU7B7HP5GSKEBY.all.mainnet.ethdisco.net

matkt commented 1 year ago

Could you check if you have another older exception during sync. Like Busy Exception or other . A log starting with "Unexpected exception in pipeline. "

This exception can occur during sync so it is important to look at all the logs from the start of the sync

The last bugs that causes a 99% stuck we found are all linked to a Rocksdb Busy Exception which arrives during the sync . We have an idea to solve it but I would like to be sure that this is also what you have

anovalunosis commented 1 year ago

I've enabled DEBUG modus and now receiving the following output: (hope this helps)

teku-besu_node-1 | 2023-09-30T21:50:18.261270853Z 2023-09-30 21:50:18.261+00:00 | vert.x-worker-thread-5 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bed6",false] teku-besu_node-1 | 2023-09-30T21:50:18.261871711Z 2023-09-30 21:50:18.261+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.261933861Z 2023-09-30 21:50:18.261+00:00 | vert.x-worker-thread-11 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bed7",false] teku-besu_node-1 | 2023-09-30T21:50:18.262072363Z 2023-09-30 21:50:18.262+00:00 | nioEventLoopGroup-3-2 | DEBUG | HandshakeHandlerOutbound | Wrote initial crypto handshake message to /66.18.13.150:30303. teku-besu_node-1 | 2023-09-30T21:50:18.262636505Z 2023-09-30 21:50:18.262+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.262684962Z 2023-09-30 21:50:18.262+00:00 | vert.x-worker-thread-0 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bed8",false] teku-besu_node-1 | 2023-09-30T21:50:18.263211579Z 2023-09-30 21:50:18.263+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.263306205Z 2023-09-30 21:50:18.263+00:00 | vert.x-worker-thread-8 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bed9",false] teku-besu_node-1 | 2023-09-30T21:50:18.263951516Z 2023-09-30 21:50:18.263+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.264004901Z 2023-09-30 21:50:18.263+00:00 | vert.x-worker-thread-18 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2beda",false] teku-besu_node-1 | 2023-09-30T21:50:18.264128517Z 2023-09-30 21:50:18.264+00:00 | nioEventLoopGroup-3-3 | DEBUG | AbstractPeerConnection | New PeerConnection ([Connection with hashCode -2104838468 with peer 0xc147c1d69d9aa7507b43786d104d7e1994e500225b8ffe012559477bd9b4a8291fc66c29752d2425a2b8f971d657961a8e8b2230a81ec74f900f4c11f0dc0a88 inboundInitiated true initAt 1696110618264]) established with peer 0xc147c1d69d9aa7507b43786d104d7e1994e500225b8ffe012559477bd9b4a8291fc66c29752d2425a2b8f971d657961a8e8b2230a81ec74f900f4c11f0dc0a88 teku-besu_node-1 | 2023-09-30T21:50:18.264518701Z 2023-09-30 21:50:18.264+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.264585612Z 2023-09-30 21:50:18.264+00:00 | vert.x-worker-thread-14 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bedb",false] teku-besu_node-1 | 2023-09-30T21:50:18.265035252Z 2023-09-30 21:50:18.264+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.265113632Z 2023-09-30 21:50:18.265+00:00 | vert.x-worker-thread-6 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bedc",false] teku-besu_node-1 | 2023-09-30T21:50:18.265660287Z 2023-09-30 21:50:18.265+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.265746285Z 2023-09-30 21:50:18.265+00:00 | vert.x-worker-thread-12 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bedd",false] teku-besu_node-1 | 2023-09-30T21:50:18.266303588Z 2023-09-30 21:50:18.266+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.266393093Z 2023-09-30 21:50:18.266+00:00 | vert.x-worker-thread-9 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bede",false] teku-besu_node-1 | 2023-09-30T21:50:18.267124919Z 2023-09-30 21:50:18.267+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.267200097Z 2023-09-30 21:50:18.267+00:00 | vert.x-worker-thread-19 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bedf",false] teku-besu_node-1 | 2023-09-30T21:50:18.267830258Z 2023-09-30 21:50:18.267+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.267881267Z 2023-09-30 21:50:18.267+00:00 | vert.x-worker-thread-13 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bee0",false] teku-besu_node-1 | 2023-09-30T21:50:18.268534778Z 2023-09-30 21:50:18.268+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.268572346Z 2023-09-30 21:50:18.268+00:00 | vert.x-worker-thread-10 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bee1",false] teku-besu_node-1 | 2023-09-30T21:50:18.269043476Z 2023-09-30 21:50:18.268+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.269107604Z 2023-09-30 21:50:18.269+00:00 | vert.x-worker-thread-16 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bee2",false] teku-besu_node-1 | 2023-09-30T21:50:18.269625690Z 2023-09-30 21:50:18.269+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context. teku-besu_node-1 | 2023-09-30T21:50:18.269700122Z 2023-09-30 21:50:18.269+00:00 | vert.x-worker-thread-2 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bee3",false] teku-besu_node-1 | 2023-09-30T21:50:18.270174210Z 2023-09-30 21:50:18.270+00:00 | vert.x-eventloop-thread-1 | DEBUG | B3PropagatorExtractorMultipleHeaders | Invalid TraceId in B3 header: null'. Returning INVALID span context.

matkt commented 1 year ago

Don't have the old logs? I mean the exception had to appear long time before the stuck. If you have the old logs by doing a grep you should see a few things

anovalunosis commented 1 year ago

2023-10-02T11:46:48.076024438Z 2023-10-02 11:46:48.067+00:00 | EthScheduler-Services-8 (batchPersistStorageData) | INFO | Pipeline | Unexpected exception in pipeline. Aborting. teku-besu_node-1 | 2023-10-02T11:46:48.076048224Z org.hyperledger.besu.plugin.services.exception.StorageException: org.rocksdb.RocksDBException: TimedOut(LockTimeout) teku-besu_node-1 | 2023-10-02T11:46:48.076050377Z at org.hyperledger.besu.plugin.services.storage.rocksdb.RocksDBTransaction.put(RocksDBTransaction.java:69) teku-besu_node-1 | 2023-10-02T11:46:48.076052139Z at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageTransactionValidatorDecorator.put(SegmentedKeyValueStorageTransactionValidatorDecorator.java:50) teku-besu_node-1 | 2023-10-02T11:46:48.076060138Z at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageAdapter$KeyValueStorageTransactionAdapter.put(SegmentedKeyValueStorageAdapter.java:148) teku-besu_node-1 | 2023-10-02T11:46:48.076061695Z at org.hyperledger.besu.ethereum.storage.keyvalue.WorldStateKeyValueStorage$Updater.putAccountStorageTrieNode(WorldStateKeyValueStorage.java:219) teku-besu_node-1 | 2023-10-02T11:46:48.076063488Z at org.hyperledger.besu.ethereum.storage.keyvalue.WorldStateKeyValueStorage$Updater.putAccountStorageTrieNode(WorldStateKeyValueStorage.java:157) teku-besu_node-1 | 2023-10-02T11:46:48.076064976Z at org.hyperledger.besu.ethereum.eth.sync.snapsync.request.StorageRangeDataRequest.lambda$doPersist$0(StorageRangeDataRequest.java:99) teku-besu_node-1 | 2023-10-02T11:46:48.076066424Z at org.hyperledger.besu.ethereum.trie.CommitVisitor.maybeStoreNode(CommitVisitor.java:77) teku-besu_node-1 | 2023-10-02T11:46:48.076067819Z at org.hyperledger.besu.ethereum.eth.sync.snapsync.StackTrie$1.maybeStoreNode(StackTrie.java:138) teku-besu_node-1 | 2023-10-02T11:46:48.076069199Z at org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:68) teku-besu_node-1 | 2023-10-02T11:46:48.076070550Z at org.hyperledger.besu.ethereum.trie.patricia.LeafNode.accept(LeafNode.java:85) teku-besu_node-1 | 2023-10-02T11:46:48.076071963Z at org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:55) teku-besu_node-1 | 2023-10-02T11:46:48.076073410Z at org.hyperledger.besu.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95) teku-besu_node-1 | 2023-10-02T11:46:48.076074783Z at org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:55) teku-besu_node-1 | 2023-10-02T11:46:48.076076147Z at org.hyperledger.besu.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95) teku-besu_node-1 | 2023-10-02T11:46:48.076077540Z at org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:55) teku-besu_node-1 | 2023-10-02T11:46:48.076078897Z at org.hyperledger.besu.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95) teku-besu_node-1 | 2023-10-02T11:46:48.076081052Z at org.hyperledger.besu.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:55) teku-besu_node-1 | 2023-10-02T11:46:48.076082532Z at org.hyperledger.besu.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95) teku-besu_node-1 | 2023-10-02T11:46:48.076083877Z at org.hyperledger.besu.ethereum.trie.StoredMerkleTrie.commit(StoredMerkleTrie.java:149) teku-besu_node-1 | 2023-10-02T11:46:48.076085243Z at org.hyperledger.besu.ethereum.eth.sync.snapsync.StackTrie.commit(StackTrie.java:132) teku-besu_node-1 | 2023-10-02T11:46:48.076086636Z at org.hyperledger.besu.ethereum.eth.sync.snapsync.request.StorageRangeDataRequest.doPersist(StorageRangeDataRequest.java:112) teku-besu_node-1 | 2023-10-02T11:46:48.076088100Z at org.hyperledger.besu.ethereum.eth.sync.snapsync.request.SnapDataRequest.persist(SnapDataRequest.java:123) teku-besu_node-1 | 2023-10-02T11:46:48.076089515Z at org.hyperledger.besu.ethereum.eth.sync.snapsync.PersistDataStep.persist(PersistDataStep.java:65) teku-besu_node-1 | 2023-10-02T11:46:48.076090906Z at org.hyperledger.besu.ethereum.eth.sync.snapsync.SnapWorldStateDownloadProcess$Builder.lambda$build$8(SnapWorldStateDownloadProcess.java:285) teku-besu_node-1 | 2023-10-02T11:46:48.076093988Z at org.hyperledger.besu.services.pipeline.MapProcessor.processNextInput(MapProcessor.java:31) teku-besu_node-1 | 2023-10-02T11:46:48.076095398Z at org.hyperledger.besu.services.pipeline.ProcessingStage.run(ProcessingStage.java:38) teku-besu_node-1 | 2023-10-02T11:46:48.076096811Z at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:169) teku-besu_node-1 | 2023-10-02T11:46:48.076098201Z at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) teku-besu_node-1 | 2023-10-02T11:46:48.076099622Z at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) teku-besu_node-1 | 2023-10-02T11:46:48.076100941Z at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) teku-besu_node-1 | 2023-10-02T11:46:48.076102422Z at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) teku-besu_node-1 | 2023-10-02T11:46:48.076103817Z at java.base/java.lang.Thread.run(Thread.java:833) teku-besu_node-1 | 2023-10-02T11:46:48.076105196Z Caused by: org.rocksdb.RocksDBException: TimedOut(LockTimeout) teku-besu_node-1 | 2023-10-02T11:46:48.076106549Z at org.rocksdb.Transaction.put(Native Method) teku-besu_node-1 | 2023-10-02T11:46:48.076107865Z at org.rocksdb.Transaction.put(Transaction.java:857) teku-besu_node-1 | 2023-10-02T11:46:48.076109237Z at org.hyperledger.besu.plugin.services.storage.rocksdb.RocksDBTransaction.put(RocksDBTransaction.java:63) teku-besu_node-1 | 2023-10-02T11:46:48.076110686Z ... 31 more