hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.48k stars 809 forks source link

Bug: PeerDiscoveryController | Error while creating packet #1715

Closed forkwatch-ninja closed 2 years ago

forkwatch-ninja commented 3 years ago

Description

There's a bug that's affecting most of the nodes I'm running with this configuration:

sudo docker run -p 8545:8545 --mount type=bind,source=$HOME/besu,target=/var/lib/besu --restart on-failure hyperledger/besu:20.10.2 --network=classic --sync-mode=FAST --rpc-http-enabled

This is the latest logs before the nodes shut off:

2020-12-07 00:39:33.981+00:00 | vert.x-eventloop-thread-1 | ERROR | PeerDiscoveryController | Error while creating packet
java.util.concurrent.CompletionException: java.lang.NullPointerException
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
        at org.hyperledger.besu.ethereum.p2p.discovery.VertxPeerDiscoveryAgent$VertxAsyncExecutor.lambda$execute$1(VertxPeerDiscoveryAgent.java:247)
        at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:327)
        at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
        at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
        at org.hyperledger.besu.ethereum.p2p.discovery.VertxPeerDiscoveryAgent.sendOutgoingPacket(VertxPeerDiscoveryAgent.java:148)
        at org.hyperledger.besu.ethereum.p2p.discovery.PeerDiscoveryAgent.handleOutgoingPacket(PeerDiscoveryAgent.java:214)
        at org.hyperledger.besu.ethereum.p2p.discovery.internal.PeerDiscoveryController.lambda$sendPacket$8(PeerDiscoveryController.java:482)
        ... 14 more
2020-12-07 00:39:33.982+00:00 | vert.x-eventloop-thread-1 | ERROR | PeerDiscoveryController | Error while creating packet
java.util.concurrent.CompletionException: java.lang.NullPointerException
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
        at org.hyperledger.besu.ethereum.p2p.discovery.VertxPeerDiscoveryAgent$VertxAsyncExecutor.lambda$execute$1(VertxPeerDiscoveryAgent.java:247)
        at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:327)
        at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
        at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
        at org.hyperledger.besu.ethereum.p2p.discovery.VertxPeerDiscoveryAgent.sendOutgoingPacket(VertxPeerDiscoveryAgent.java:148)
        at org.hyperledger.besu.ethereum.p2p.discovery.PeerDiscoveryAgent.handleOutgoingPacket(PeerDiscoveryAgent.java:214)
        at org.hyperledger.besu.ethereum.p2p.discovery.internal.PeerDiscoveryController.lambda$sendPacket$8(PeerDiscoveryController.java:482)
        ... 14 more
2020-12-07 00:39:35.524+00:00 | Thread-1 | ERROR | EthScheduler | EthScheduler services executor did not shutdown cleanly.
2020-12-07 00:39:35.524+00:00 | Thread-1 | INFO  | EthProtocolManager | eth Subprotocol stopped.
2020-12-07 00:39:35.524+00:00 | Thread-1 | INFO  | NetworkRunner | Network stopped.
2020-12-07 00:39:35.525+00:00 | Thread-1 | INFO  | AutoTransactionLogBloomCachingService | Shutting down Auto transaction logs caching service.
2020-12-07 00:39:35.527+00:00 | Thread-1 | INFO  | DockerNatManager | Stopping docker NAT manager.

It's impacting multiple nodes.

3nprob commented 3 years ago

Same here, but slightly different logs. On besu-21.1.4 running on docker.

Two otherwise identical nodes on rinkeby/mainnet; so far only the mainnet node seems affected.

The below error logs multiple times shortly after startup:

2021-04-15 06:45:24.724+00:00 | vert.x-eventloop-thread-2 | ERROR | PeerDiscoveryController | Error while creating packet
java.util.concurrent.CompletionException: java.lang.NullPointerException: Cannot invoke "io.vertx.core.datagram.DatagramSocket.send(io.vertx.core.buffer.Buffer, int, String, io.vertx.core.Handler)" because "this.socket" is null
    at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
    at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
    at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2137)
    at org.hyperledger.besu.ethereum.p2p.discovery.VertxPeerDiscoveryAgent$VertxAsyncExecutor.lambda$execute$1(VertxPeerDiscoveryAgent.java:259)
    at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:327)
    at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
    at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.NullPointerException: Cannot invoke "io.vertx.core.datagram.DatagramSocket.send(io.vertx.core.buffer.Buffer, int, String, io.vertx.core.Handler)" because "this.socket" is null
    at org.hyperledger.besu.ethereum.p2p.discovery.VertxPeerDiscoveryAgent.sendOutgoingPacket(VertxPeerDiscoveryAgent.java:160)
    at org.hyperledger.besu.ethereum.p2p.discovery.PeerDiscoveryAgent.handleOutgoingPacket(PeerDiscoveryAgent.java:297)
    at org.hyperledger.besu.ethereum.p2p.discovery.internal.PeerDiscoveryController.lambda$sendPacket$11(PeerDiscoveryController.java:504)
    at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:714)
    ... 13 more

After this, there is a brief period of some minutes of the following (~1-5min), after which the process exits with

2021-04-15 06:50:44.734+00:00 | Timer-0 | ERROR | Besu | Uncaught exception in thread "Timer-0"
java.nio.channels.ClosedChannelException
    at java.base/java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:222)
    at org.xbill.DNS.NioTcpClient.processPendingRegistrations(NioTcpClient.java:42)
    at java.base/java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:807)
    at org.xbill.DNS.Client.runSelector(Client.java:68)
    at java.base/java.lang.Thread.run(Thread.java:832)
2021-04-15 06:50:48.238+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 1
2021-04-15 06:50:51.829+00:00 | nioEventLoopGroup-3-1 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 2
2021-04-15 06:50:54.749+00:00 | nioEventLoopGroup-3-3 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 2
2021-04-15 06:50:55.011+00:00 | nioEventLoopGroup-3-5 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 3
2021-04-15 06:50:56.840+00:00 | nioEventLoopGroup-3-9 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 3
2021-04-15 06:50:56.914+00:00 | nioEventLoopGroup-3-4 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 3
2021-04-15 06:51:01.915+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 3
2021-04-15 06:51:06.247+00:00 | EthScheduler-Timer-0 | INFO  | AbstractRetryingPeerTask | No peers available, waiting for peers: 1
2021-04-15 06:51:06.915+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 1
2021-04-15 06:51:08.441+00:00 | nioEventLoopGroup-3-4 | INFO  | AbstractRetryingPeerTask | No peers available, waiting for peers: 1
2021-04-15 06:51:08.441+00:00 | nioEventLoopGroup-3-4 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 1
2021-04-15 06:51:13.150+00:00 | nioEventLoopGroup-3-10 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 1
2021-04-15 06:51:13.158+00:00 | EthScheduler-Services-4 (batchDownloadData) | INFO  | AbstractRetryingPeerTask | No peers available, waiting for peers: 1
2021-04-15 06:51:18.150+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 0
2021-04-15 06:51:18.158+00:00 | EthScheduler-Timer-0 | INFO  | AbstractRetryingPeerTask | No peers available, waiting for peers: 0
2021-04-15 06:51:23.151+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 4
2021-04-15 06:51:23.159+00:00 | EthScheduler-Timer-0 | INFO  | AbstractRetryingPeerTask | No peers available, waiting for peers: 3
2021-04-15 06:51:24.277+00:00 | nioEventLoopGroup-3-3 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 1
2021-04-15 06:51:24.277+00:00 | nioEventLoopGroup-3-3 | INFO  | AbstractRetryingPeerTask | No peers available, waiting for peers: 1
2021-04-15 06:51:29.278+00:00 | EthScheduler-Timer-0 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 0
2021-04-15 06:51:29.279+00:00 | EthScheduler-Services-4 (batchDownloadData) | INFO  | AbstractRetryingPeerTask | No peers available, waiting for peers: 0
2021-04-15 06:51:31.669+00:00 | nioEventLoopGroup-3-5 | INFO  | FastSyncTargetManager | No sync target, waiting for peers: 1
2021-04-15 06:51:31.669+00:00 | nioEventLoopGroup-3-5 | INFO  | AbstractRetryingPeerTask | No peers available, waiting for peers: 1
donald-jackson commented 3 years ago

Are we sure this has been resolved? I'm getting same error on latest release (21.1.7) and just never recovers. I believe this is some race/temporary condition as if I restart the besu instance multiple times eventually it will start.

Have tried both

besu/v21.1.7/linux-x86_64/corretto-java-11

and

besu/v21.1.7/linux-x86_64/openjdk-java-11
Jul 05 16:47:05 besu_eth-i-068c136126797b846.prod besu[4872]: 2021-07-05 16:47:05.390+00:00 | Timer-0 | ERROR | Besu | Uncaught exception in thread "Timer-0"
Jul 05 16:47:05 besu_eth-i-068c136126797b846.prod besu[4872]: java.io.EOFException
Jul 05 16:47:05 besu_eth-i-068c136126797b846.prod besu[4872]: at org.xbill.DNS.NioTcpClient$ChannelState.processRead(NioTcpClient.java:168)
Jul 05 16:47:05 besu_eth-i-068c136126797b846.prod besu[4872]: at org.xbill.DNS.NioTcpClient$ChannelState.processReadyKey(NioTcpClient.java:126)
Jul 05 16:47:05 besu_eth-i-068c136126797b846.prod besu[4872]: at org.xbill.DNS.Client.processReadyKeys(Client.java:97)
Jul 05 16:47:05 besu_eth-i-068c136126797b846.prod besu[4872]: at org.xbill.DNS.Client.runSelector(Client.java:72)
Jul 05 16:47:05 besu_eth-i-068c136126797b846.prod besu[4872]: at java.base/java.lang.Thread.run(Thread.java:829)
Jul 05 16:47:07 besu_eth-i-068c136126797b846.prod besu[4872]: 2021-07-05 16:47:07.846+00:00 | EthScheduler-Timer-0 | INFO  | FullSyncTargetManager | No sync target, waiting for peers: 0
Jul 05 16:47:12 besu_eth-i-068c136126797b846.prod besu[4872]: 2021-07-05 16:47:12.847+00:00 | EthScheduler-Timer-0 | INFO  | FullSyncTargetManager | No sync target, waiting for peers: 0
Jul 05 16:47:17 besu_eth-i-068c136126797b846.prod besu[4872]: 2021-07-05 16:47:17.848+00:00 | EthScheduler-Timer-0 | INFO  | FullSyncTargetManager | No sync target, waiting for peers: 0
donald-jackson commented 3 years ago

Just upgraded to 21.7.0, still happening:

Jul 12 11:25:36 besu_eth-i-0022e6b80d22c4501.prod besu[3344]: 2021-07-12 11:25:36.684+00:00 | Timer-0 | ERROR | Besu | Uncaught exception in thread "Timer-0"
Jul 12 11:25:36 besu_eth-i-0022e6b80d22c4501.prod besu[3344]: java.nio.channels.ClosedChannelException
Jul 12 11:25:36 besu_eth-i-0022e6b80d22c4501.prod besu[3344]: at java.base/java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:206)
Jul 12 11:25:36 besu_eth-i-0022e6b80d22c4501.prod besu[3344]: at org.xbill.DNS.NioTcpClient.processPendingRegistrations(NioTcpClient.java:42)
Jul 12 11:25:36 besu_eth-i-0022e6b80d22c4501.prod besu[3344]: at java.base/java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:807)
Jul 12 11:25:36 besu_eth-i-0022e6b80d22c4501.prod besu[3344]: at org.xbill.DNS.Client.runSelector(Client.java:68)
Jul 12 11:25:36 besu_eth-i-0022e6b80d22c4501.prod besu[3344]: at java.base/java.lang.Thread.run(Thread.java:829)

/besu/bin/besu --version
besu/v21.7.0/linux-x86_64/corretto-java-11
atoulme commented 3 years ago

It hasn’t been resolved, Tuweni 2.0 upgrade coming

garyschulte commented 2 years ago

Was this resolved by https://github.com/hyperledger/besu/pull/2376 ?

atoulme commented 2 years ago

Yes. Closing.