vert-x3 / vertx-zookeeper

Zookeeper based cluster manager implementation
Other
72 stars 67 forks source link

NullPointerException in ZKMap #121

Open neterium opened 2 years ago

neterium commented 2 years ago

Version

4.0.3

Context

In development, when nodes of the (Zookeeper) cluster are started and stopped often, we randomly receive this exception:

java.lang.NullPointerException at io.vertx.spi.cluster.zookeeper.impl.ZKMap.keyPath(ZKMap.java:70) at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.get(ZKSyncMap.java:98) at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.lambda$entrySet$5(ZKSyncMap.java:225) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.entrySet(ZKSyncMap.java:227) at io.vertx.core.impl.HAManager.nodeLeft(HAManager.java:323) at io.vertx.core.impl.HAManager.access$100(HAManager.java:94) at io.vertx.core.impl.HAManager$1.nodeLeft(HAManager.java:150) at io.vertx.spi.cluster.zookeeper.ZookeeperClusterManager.childEvent(ZookeeperClusterManager.java:419) at com.neterium.context.ClusterContext$1.childEvent(ClusterContext.java:72) at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:538) at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:532) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:530) at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:808) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) at java.util.concurrent.FutureTask.run(FutureTask.java) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) at java.util.concurrent.FutureTask.run(FutureTask.java) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Just before this, we have this logged by Vertx:

2021-10-14 13:10:52.903 ERROR 34324 --- [ntloop-thread-0] io.vertx.core.net.impl.ConnectionBase : An existing connection was forcibly closed by the remote host 2021-10-14 13:11:13.987 WARN 34324 --- [ChildrenCache-0] i.v.s.cluster.zookeeper.impl.ZKSyncMap : node lost KeeperErrorCode = NoNode for /io.vertx/default/syncMap/__vertx.haInfo/45db3530-9ce0-4c94-9374-1355b20ab753

Do you have a reproducer?

Not really, it happens randomly, not gracefully killing servers increases the odds.

Extra

An easy fix would just be to add some code to check if the "k" argument of the ZKPath::keyPath(k) method is null and, maybe, in ZKSyncMap::get(k) when keyPath returns null (?).