vert-x3 / vertx-zookeeper

Zookeeper based cluster manager implementation
Other
73 stars 67 forks source link

Vertx zookeeper error #63

Open mscofy opened 6 years ago

mscofy commented 6 years ago

I am using vertx with zookeeper as my cluster manager. Suddenly I started getting this excpetion in my logs:

ERROR io.vertx.spi.cluster.zookeeper.ZookeeperClusterManager - Failed to handle memberRemoved io.vertx.core.VertxException: java.io.InvalidClassException: io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap$KeyValue; local class incompatible: stream classdesc serialVersionUID = -472487054773491211, local class serialVersionUID = -6066652933022630380 at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.get(ZKSyncMap.java:95) How can I fix it?

Regards, Ido

stream-iori commented 6 years ago

Hi Do you upgrade vert.x from old version to the latest? which version of vert.x your running.

Thanks.

mscofy commented 6 years ago

Yes,i upgrade vext.x from 3.4.2 to 3.5.0. Now i'm running version 3.5.0

stream-iori commented 6 years ago

We using Java serialization to save object into the zk, this issue would be happen if there are multi version vertx-zookeeper in one cluster, you have to restart all the verticle with latest version of vertx-zookeeper. I hope could refactor this method in the next release.

aruis commented 5 years ago

I also encountered this problem. And I also used a different version of vertx-zookeeper.

[ Curator-PathChildrenCache-0 ] - [ ERROR ] [ i.v.spi.cluster.zookeeper.ZookeeperClusterManager : 167 ] - Failed to handle memberRemoved
io.vertx.core.json.DecodeException: Failed to decode: null
    at io.vertx.core.json.Json.decodeValue(Json.java:124)
    at io.vertx.core.json.JsonObject.fromJson(JsonObject.java:956)
    at io.vertx.core.json.JsonObject.<init>(JsonObject.java:48)
    at io.vertx.core.impl.HAManager.nodeLeft(HAManager.java:322)
    at io.vertx.core.impl.HAManager.access$100(HAManager.java:97)
    at io.vertx.core.impl.HAManager$1.nodeLeft(HAManager.java:150)
    at io.vertx.spi.cluster.zookeeper.ZookeeperClusterManager.childEvent(ZookeeperClusterManager.java:409)
    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522)

Now I try to upgrade to the latest version of vertx-zookeeper, and I will give feedback after the results.

ijl20 commented 4 years ago

In all the years we've been running vertx, first with Hazelcast and now with Zookeeper, this failure to connect to a running cluster unless you have the exact same minor version of vertx-hazelcast or vertx-zookeeper as every other verticle in the entire distributed system has been the major pain in the ass and IMHO an odd design choice. We heavily use eventbus message passing between verticles but don't attempt to share state - it might be more effective to use an external messaging service and not cluster the verticles. Of course this is totally doable but seems to defeat what I took as a core capability of Vertx.

melaraj2 commented 4 years ago

I have the same version across the entire cluster, but this issue occurs when I redeploy one of the members. So far the only way to resolve it is to redeploy every node in the cluster. I am currently on version 3.9.1.

Is there anyone looking into this issue. I agree with ijl20. this is a severe problem, and like his system, we are only using clustering for event bus message passing.

This is my exception that i am seeing:

2020-06-20T10:24:16 - SEVERE: Failed to handle memberRemoved
2020-06-20T10:24:16 - io.vertx.core.VertxException: io.vertx.core.VertxException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /prod.backendApi/syncMap/__vertx.haInfo/aa7b8e23-5547-4135-981c-a7e2482cf149
2020-06-20T10:24:16 -   at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.keySet(ZKSyncMap.java:156)
2020-06-20T10:24:16 -   at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.entrySet(ZKSyncMap.java:180)
2020-06-20T10:24:16 -   at io.vertx.core.impl.HAManager.nodeLeft(HAManager.java:320)
2020-06-20T10:24:16 -   at io.vertx.core.impl.HAManager.access$100(HAManager.java:97)
2020-06-20T10:24:16 -   at io.vertx.core.impl.HAManager$1.nodeLeft(HAManager.java:150)
2020-06-20T10:24:16 -   at io.vertx.spi.cluster.zookeeper.ZookeeperClusterManager.childEvent(ZookeeperClusterManager.java:409)
2020-06-20T10:24:16 -   at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522)
2020-06-20T10:24:16 -   at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516)
2020-06-20T10:24:16 -   at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
2020-06-20T10:24:16 -   at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
2020-06-20T10:24:16 -   at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
2020-06-20T10:24:16 -   at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:514)
2020-06-20T10:24:16 -   at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)
2020-06-20T10:24:16 -   at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773)
2020-06-20T10:24:16 -   at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
2020-06-20T10:24:16 -   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
2020-06-20T10:24:16 -   at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
2020-06-20T10:24:16 -   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
2020-06-20T10:24:16 -   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2020-06-20T10:24:16 -   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2020-06-20T10:24:16 -   at java.base/java.lang.Thread.run(Thread.java:834)
2020-06-20T10:24:16 - Caused by: io.vertx.core.VertxException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /prod.backendApi/syncMap/__vertx.haInfo/aa7b8e23-5547-4135-981c-a7e2482cf149
2020-06-20T10:24:16 -   at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.lambda$keySet$2(ZKSyncMap.java:152)
2020-06-20T10:24:16 -   at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
2020-06-20T10:24:16 -   at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
2020-06-20T10:24:16 -   at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
2020-06-20T10:24:16 -   at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
2020-06-20T10:24:16 -   at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
2020-06-20T10:24:16 -   at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
2020-06-20T10:24:16 -   at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
2020-06-20T10:24:16 -   at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.keySet(ZKSyncMap.java:154)
2020-06-20T10:24:16 -   ... 20 more
2020-06-20T10:24:16 - Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /prod.backendApi/syncMap/__vertx.haInfo/aa7b8e23-5547-4135-981c-a7e2482cf149
2020-06-20T10:24:16 -   at org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
2020-06-20T10:24:16 -   at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
2020-06-20T10:24:16 -   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1221)
2020-06-20T10:24:16 -   at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
2020-06-20T10:24:16 -   at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
2020-06-20T10:24:16 -   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
2020-06-20T10:24:16 -   at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:296)
2020-06-20T10:24:16 -   at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
2020-06-20T10:24:16 -   at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
2020-06-20T10:24:16 -   at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.lambda$keySet$2(ZKSyncMap.java:149)
2020-06-20T10:24:16 -   ... 28 more
2020-06-20T10:24:22 - Jun 20, 2020 2:24:22 PM io.vertx.core.eventbus.impl.clustered.ClusteredEventBus
2020-06-20T10:24:22 - WARNING: Error removing subs
2020-06-20T10:24:22 - java.io.EOFException
2020-06-20T10:24:22 -   at java.base/java.io.DataInputStream.readBoolean(DataInputStream.java:249)
2020-06-20T10:24:22 -   at io.vertx.spi.cluster.zookeeper.impl.ZKMap.asObject(ZKMap.java:110)
2020-06-20T10:24:22 -   at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.lambda$null$22(ZKAsyncMultiMap.java:195)
2020-06-20T10:24:22 -   at java.base/java.util.Optional.ifPresent(Optional.java:183)
2020-06-20T10:24:22 -   at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.lambda$null$23(ZKAsyncMultiMap.java:193)
2020-06-20T10:24:22 -   at java.base/java.lang.Iterable.forEach(Iterable.java:75)
2020-06-20T10:24:22 -   at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.lambda$null$24(ZKAsyncMultiMap.java:189)
2020-06-20T10:24:22 -   at java.base/java.lang.Iterable.forEach(Iterable.java:75)
2020-06-20T10:24:22 -   at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.lambda$removeAllMatching$26(ZKAsyncMultiMap.java:187)
2020-06-20T10:24:22 -   at java.base/java.util.Optional.ifPresent(Optional.java:183)
2020-06-20T10:24:22 -   at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.removeAllMatching(ZKAsyncMultiMap.java:186)
2020-06-20T10:24:22 -   at io.vertx.core.eventbus.impl.clustered.ClusteredEventBus.lambda$setClusterViewChangedHandler$12(ClusteredEventBus.java:273)
2020-06-20T10:24:22 -   at io.vertx.core.impl.HAManager.lambda$checkSubs$12(HAManager.java:520)
2020-06-20T10:24:22 -   at io.vertx.core.impl.HAManager.lambda$runOnContextAndWait$13(HAManager.java:529)
2020-06-20T10:24:22 -   at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
2020-06-20T10:24:22 -   at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
2020-06-20T10:24:22 -   at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
2020-06-20T10:24:22 -   at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
2020-06-20T10:24:22 -   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
2020-06-20T10:24:22 -   at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
2020-06-20T10:24:22 -   at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
2020-06-20T10:24:22 -   at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2020-06-20T10:24:22 -   at java.base/java.lang.Thread.run(Thread.java:834)
stream-iori commented 4 years ago

I have the same version across the entire cluster, but this issue occurs when I redeploy one of the members. So far the only way to resolve it is to redeploy every node in the cluster. I am currently on version 3.9.1.

Is there anyone looking into this issue. I agree with ijl20. this is a severe problem, and like his system, we are only using clustering for event bus message passing.

This is my exception that i am seeing:

2020-06-20T10:24:16 - SEVERE: Failed to handle memberRemoved
2020-06-20T10:24:16 - io.vertx.core.VertxException: io.vertx.core.VertxException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /prod.backendApi/syncMap/__vertx.haInfo/aa7b8e23-5547-4135-981c-a7e2482cf149
2020-06-20T10:24:16 -     at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.keySet(ZKSyncMap.java:156)
2020-06-20T10:24:16 -     at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.entrySet(ZKSyncMap.java:180)
2020-06-20T10:24:16 -     at io.vertx.core.impl.HAManager.nodeLeft(HAManager.java:320)
2020-06-20T10:24:16 -     at io.vertx.core.impl.HAManager.access$100(HAManager.java:97)
2020-06-20T10:24:16 -     at io.vertx.core.impl.HAManager$1.nodeLeft(HAManager.java:150)
2020-06-20T10:24:16 -     at io.vertx.spi.cluster.zookeeper.ZookeeperClusterManager.childEvent(ZookeeperClusterManager.java:409)
2020-06-20T10:24:16 -     at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522)
2020-06-20T10:24:16 -     at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516)
2020-06-20T10:24:16 -     at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
2020-06-20T10:24:16 -     at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
2020-06-20T10:24:16 -     at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
2020-06-20T10:24:16 -     at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:514)
2020-06-20T10:24:16 -     at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)
2020-06-20T10:24:16 -     at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773)
2020-06-20T10:24:16 -     at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
2020-06-20T10:24:16 -     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
2020-06-20T10:24:16 -     at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
2020-06-20T10:24:16 -     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
2020-06-20T10:24:16 -     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2020-06-20T10:24:16 -     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2020-06-20T10:24:16 -     at java.base/java.lang.Thread.run(Thread.java:834)
2020-06-20T10:24:16 - Caused by: io.vertx.core.VertxException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /prod.backendApi/syncMap/__vertx.haInfo/aa7b8e23-5547-4135-981c-a7e2482cf149
2020-06-20T10:24:16 -     at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.lambda$keySet$2(ZKSyncMap.java:152)
2020-06-20T10:24:16 -     at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
2020-06-20T10:24:16 -     at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
2020-06-20T10:24:16 -     at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
2020-06-20T10:24:16 -     at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
2020-06-20T10:24:16 -     at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
2020-06-20T10:24:16 -     at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
2020-06-20T10:24:16 -     at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
2020-06-20T10:24:16 -     at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.keySet(ZKSyncMap.java:154)
2020-06-20T10:24:16 -     ... 20 more
2020-06-20T10:24:16 - Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /prod.backendApi/syncMap/__vertx.haInfo/aa7b8e23-5547-4135-981c-a7e2482cf149
2020-06-20T10:24:16 -     at org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
2020-06-20T10:24:16 -     at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
2020-06-20T10:24:16 -     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1221)
2020-06-20T10:24:16 -     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
2020-06-20T10:24:16 -     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
2020-06-20T10:24:16 -     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
2020-06-20T10:24:16 -     at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:296)
2020-06-20T10:24:16 -     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
2020-06-20T10:24:16 -     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
2020-06-20T10:24:16 -     at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.lambda$keySet$2(ZKSyncMap.java:149)
2020-06-20T10:24:16 -     ... 28 more
2020-06-20T10:24:22 - Jun 20, 2020 2:24:22 PM io.vertx.core.eventbus.impl.clustered.ClusteredEventBus
2020-06-20T10:24:22 - WARNING: Error removing subs
2020-06-20T10:24:22 - java.io.EOFException
2020-06-20T10:24:22 -     at java.base/java.io.DataInputStream.readBoolean(DataInputStream.java:249)
2020-06-20T10:24:22 -     at io.vertx.spi.cluster.zookeeper.impl.ZKMap.asObject(ZKMap.java:110)
2020-06-20T10:24:22 -     at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.lambda$null$22(ZKAsyncMultiMap.java:195)
2020-06-20T10:24:22 -     at java.base/java.util.Optional.ifPresent(Optional.java:183)
2020-06-20T10:24:22 -     at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.lambda$null$23(ZKAsyncMultiMap.java:193)
2020-06-20T10:24:22 -     at java.base/java.lang.Iterable.forEach(Iterable.java:75)
2020-06-20T10:24:22 -     at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.lambda$null$24(ZKAsyncMultiMap.java:189)
2020-06-20T10:24:22 -     at java.base/java.lang.Iterable.forEach(Iterable.java:75)
2020-06-20T10:24:22 -     at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.lambda$removeAllMatching$26(ZKAsyncMultiMap.java:187)
2020-06-20T10:24:22 -     at java.base/java.util.Optional.ifPresent(Optional.java:183)
2020-06-20T10:24:22 -     at io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap.removeAllMatching(ZKAsyncMultiMap.java:186)
2020-06-20T10:24:22 -     at io.vertx.core.eventbus.impl.clustered.ClusteredEventBus.lambda$setClusterViewChangedHandler$12(ClusteredEventBus.java:273)
2020-06-20T10:24:22 -     at io.vertx.core.impl.HAManager.lambda$checkSubs$12(HAManager.java:520)
2020-06-20T10:24:22 -     at io.vertx.core.impl.HAManager.lambda$runOnContextAndWait$13(HAManager.java:529)
2020-06-20T10:24:22 -     at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
2020-06-20T10:24:22 -     at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
2020-06-20T10:24:22 -     at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
2020-06-20T10:24:22 -     at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
2020-06-20T10:24:22 -     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
2020-06-20T10:24:22 -     at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
2020-06-20T10:24:22 -     at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
2020-06-20T10:24:22 -     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2020-06-20T10:24:22 -     at java.base/java.lang.Thread.run(Thread.java:834)

This exception should have been swallow and make a WARNING to log. I am workng on vertx4-zookeeper, and this exception should convert to warning log

melaraj2 commented 4 years ago

The problem is not just the exception, in my case I have a JVM instance that calls eventBus.request to an address being consumed by a set of JVMs, when I restart the consumers, the requests to that address fail with timeout exceptions. It appears that the only way to resolve it is by also restarting the requester JVMs on other instances. In my case, it is a normal course of business to update the consumers in the cluster. I imagine this is a common thing.

All of this runs on AWS Fargate, so its important to note that IP addresses change when i restart the service.

melaraj2 commented 4 years ago

the sending Vertx instances continue to attempt to send the request to the old IP address that went away. This is permanent, there is no recovery unless I restart the sender.

vietj commented 4 years ago

can you provide a reproducer ?

On Sun, Jun 21, 2020 at 5:17 AM Manuel Elaraj notifications@github.com wrote:

The problem is not just the exception, in my case I have a JVM instance that calls eventBus.request to an address being consumed by a set of JVMs, when I restart the consumers, the requests to that address fail with timeout exceptions. It appears that the only way to resolve it is by also restarting the requester JVMs. In my case, it is a normal course of business to update the consumers in the cluster. I imagine this is a common thing.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vert-x3/vertx-zookeeper/issues/63#issuecomment-647073092, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCSZSZJHP356KB3YSC3RXV3WHANCNFSM4ECXDK7A .