vertx always sends messages to one node even if there are multiple nodes connected in the cluster

mpeddagolla commented 9 years ago

Hi , below is my scenario.

I am using vertx with 3 nodes using zookeeper as its cluster manager. I have one zookeeper server running locally and all three 3 nodes (clients) connecting to it. But vertx always selects the recently joined node in the cluster for sending messages instead of sending messages round robin to all the clients connected. I am using SEND and not PUBLISH because I always want only one handler for a message.When I debugged, I found out that in the below code, ChoosableSet is always created. Vetrx before sending the message gets subs from below API and then call choose() which will always returns the first element of the set which will be same node. So basically all the messages are send to one node only. After the snippet, I have also added code with my little hack to distribute the load across all nodes (not a round robin). Do you have any fix or suggestions for me make this perfect round robin?

@Override
public void get(K k, Handler<AsyncResult<ChoosableIterable<V>>> asyncResultHandler) {
    Random random = new Random();

    if (!keyIsNull(k, asyncResultHandler)) {
        vertx.runOnContext(event -> {
            Map<String, ChildData> maps = curatorCache.getCurrentChildren(keyPath(k));
            ChoosableSet<V> choosableSet = new ChoosableSet<>(0);
            if (maps != null) {
                for (ChildData childData : maps.values()) {
                    try {
                        if (childData != null && childData.getData() != null && childData.getData().length > 0)
                            choosableSet.add(asObject(childData.getData()));
                    } catch (Exception ex) {
                        asyncResultHandler.handle(Future.failedFuture(ex));
                    }
                }
            }
            asyncResultHandler.handle(Future.succeededFuture(choosableSet));
        });
    }
}

So locally I made a change to the API

@Override
public void get(K k, Handler<AsyncResult<ChoosableIterable<V>>> asyncResultHandler) {
    Random random = new Random();

    if (!keyIsNull(k, asyncResultHandler)) {
        vertx.runOnContext(event -> {
            Map<String, ChildData> maps = curatorCache.getCurrentChildren(keyPath(k));
            ChoosableSet<V> choosableSet = new ChoosableSet<>(0);
            if (maps != null) {
                for (ChildData childData : maps.values()) {
                  try {
                     if (childData != null && childData.getData() != null && childData.getData().length > 0)
                            choosableSet.add(asObject(childData.getData()));
                    } catch (Exception ex) {
                        asyncResultHandler.handle(Future.failedFuture(ex));
                    }
                }

                // Hack to iterate to a random sub so it will distribute to different subscribers
               //  Its not round robin but at least distribute the load. 
                if ( (choosableSet.size() > 1) &&
                        choosableSet.iterator() != null) {
                    int randomClient = random.nextInt((choosableSet.size() - 1) - 0 + 1) + 0;
                    for (int i = 0; i <= randomClient; i++) {
                        choosableSet.choose();
                    }
                }
            }
            asyncResultHandler.handle(Future.succeededFuture(choosableSet));
        });
    }
}

Vertx-Hazelcast works great with distribution of messages across all the nodes because of how get subs is implemented in HazelcastAsyncMultiMap but we have problems with nodes concurrently shutting down where cache has dead nodes and we are losing messages. https://github.com/vert-x3/vertx-hazelcast/issues/13

Thank you Mallikarjun

stream-iori commented 9 years ago

Ops. There is mistake indeed. i will try to make PR for this. Thank your report

mpeddagolla commented 9 years ago

thank you.

vert-x3 / vertx-zookeeper

vertx always sends messages to one node even if there are multiple nodes connected in the cluster #10