vert-x3 / issues

Apache License 2.0
36 stars 7 forks source link

ConnectionHolder - Connecting to server failed #577

Closed clsaa closed 3 years ago

clsaa commented 3 years ago

Hi, I use EventBus of VertX to realize the P2P communication between cluster nodes, but sometimes the following exceptions will be reported in the communication between nodes. This problem is particularly evident after the cluster is restarted, and it is common for several nodes in the cluster to fail to connect.

MyConfig:

EventBusOptions eventBusOptions = new EventBusOptions() .setHost(NetUtils.getInstanceIpWithCache()); VertxOptions options = new VertxOptions() //cluster .setClusterManager(clusterManager) //event bus .setEventBusOptions(eventBusOptions) //poolSize .setEventLoopPoolSize(vertxConfig.getEventLoopPoolSize()) .setWorkerPoolSize(vertxConfig.getWorkerPoolSize()) .setInternalBlockingPoolSize(vertxConfig.getInternalBlockingPoolSize()) //time .setWarningExceptionTime(vertxConfig.getWarningExceptionTimeInMillis()) .setWarningExceptionTimeUnit(TimeUnit.MILLISECONDS) .setBlockedThreadCheckInterval(vertxConfig.getBlockingIntervalInMillis()) .setBlockedThreadCheckIntervalUnit(TimeUnit.MILLISECONDS) .setMaxEventLoopExecuteTime(vertxConfig.getMaxEventLoopExecuteTimeInMillis()) .setMaxEventLoopExecuteTimeUnit(TimeUnit.MILLISECONDS) .setMaxWorkerExecuteTime(vertxConfig.getMaxWorkerExecuteTime()) .setMaxWorkerExecuteTimeUnit(TimeUnit.MILLISECONDS);

    return options;

[vert.x-eventloop-thread-0] WARN s.i.v.c.eventbus.impl.clustered.ConnectionHolder - Connecting to server e28f77cf-4d6c-4847-b819-0a15559b32da failed io.netty.channel.AbstractChannel$AnnotatedConnectException: refuse connection: /33.5.70.81:37679 Caused by: java.net.ConnectException: refuse connection at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:852)

[vert.x-eventloop-thread-2] ERROR c.a.a.t.g.p.e.c.vertx.TaskStatusVertxEventSender - [GEI]-fail--TaskStatusVertxEventSender#recive#failed, taskId:GEI@test-module-sharding-cache-file-in-oss-export-code@210413200020@7228, sliceNo:1, cost:60062ms, traceId:null, params:[topic:GEI@ascp-tools@TASK_STATUS@33.5.70.81, eventClass:SliceStartEvent] shaded.io.vertx.core.eventbus.ReplyException: Timed out after waiting 60000(ms) for a reply. address: __vertx.reply.0a3b7ace-02f2-4e0e-bff0-103655eb2272, repliedAddress: GEI@ascp-tools@TASK_STATUS@33.5.70.81 at shaded.io.vertx.core.eventbus.impl.ReplyHandler.lambda$new$0(ReplyHandler.java:42) at shaded.io.vertx.core.impl.VertxImpl$InternalTimerHandler.handle(VertxImpl.java:951) at shaded.io.vertx.core.impl.VertxImpl$InternalTimerHandler.handle(VertxImpl.java:918) at shaded.io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:52) at shaded.io.vertx.core.impl.ContextImpl.emit(ContextImpl.java:294) at shaded.io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:24) at shaded.io.vertx.core.impl.AbstractContext.emit(AbstractContext.java:49) at shaded.io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:24) at shaded.io.vertx.core.impl.VertxImpl$InternalTimerHandler.run(VertxImpl.java:941) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:852)

tsegismont commented 3 years ago

When you restart some nodes it is possible that:

Consequently, your system must be prepared for message loss if using send (fire and forget). If you use request (request/reply), you may implement a retry strategy when a timeout ReplyException is received.