nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16.04k stars 1.42k forks source link

Queue Groups on leaf clusters not balancing correctly when messages are routed in from hub cluster [v2.10.21] #5972

Closed roeschter closed 1 month ago

roeschter commented 1 month ago

Observed behavior

From two subscribers in a queue group connected to two different nodes in a leaf cluster only one will receive messages IFF those messages arrive from a hub cluster to which the leaf cluster is connected. The second subscribers (initially idle) will receive messages once the primary is terminated. This is reversible (restart primary connected to the original node and will take over again).

Expected behavior

All subcribers in the same queue group a leaf cluster are treated equally.

Server and client version

Server 2.10.21 Nats - compiled from main

Host environment

Windows/Mac/Linux - reproduced by customer, Borja and me

Steps to reproduce

To reproduce deterministically, all leaf node connections and nats-cli connections will connect to specific nodes.

  1. Create HUB cluster with nodes HUB1, HUB2, HUB3
  2. Create LEAF cluster with nodes LEAF1, LEAF2, LEAF3
  3. Connect leaf nodes such that LEAF1-->HUB1, LEAF2-->HUB2 LEAF3-->HUB3
  4. Start queue group listeners
  5. nats --server LEAF1 sub --queue=q1 foo
  6. nats --server LEAF2 sub --queue=q1 foo
  7. Publish messages to HUB1
  8. nats --server HUB1 pub foo Hello //Only the subscriber on LEAF2 will receive messages
  9. Publish messages to HUB3
  10. nats --server HUB3 pub foo Hello //Both subscribers receive messages

Tentative explanation: The "prefer local cluster listeners in work queues" logic is not leaf node aware. When the message is published such that both listeners have the same "distance" (routing hops) to the publisher the work queue LB works. When the message is published such that both listeners have the different "distance" to the publisher the work queue LB fails.

kozlovic commented 1 month ago

@roeschter Are you sure of the setup? I tried to write a test based on your description and it passes. I just check that each queue sub receives at least 10% of the messages, but we usually test for close to even distribution. But since you claim that one does not receive any message, the 10% should be enough. Here is the test, let me know what changes to make to reproduce the issue:

func TestLeafNodeQueueGroupDistribution(t *testing.T) {
    hc := createClusterWithName(t, "HUB", 3)
    defer hc.shutdown()

    // Now have a cluster of leafnodes with each one connecting to corresponding HUB(n) node.
    c1 := `
    server_name: LEAF1
    listen: 127.0.0.1:-1
    cluster { name: ln22, listen: 127.0.0.1:-1 }
    leafnodes { remotes = [{ url: nats-leaf://127.0.0.1:%d }] }
    `
    lconf1 := createConfFile(t, []byte(fmt.Sprintf(c1, hc.opts[0].LeafNode.Port)))
    ln1, lopts1 := RunServerWithConfig(lconf1)
    defer ln1.Shutdown()

    c2 := `
    server_name: LEAF2
    listen: 127.0.0.1:-1
    cluster { name: ln22, listen: 127.0.0.1:-1, routes = [ nats-route://127.0.0.1:%d] }
    leafnodes { remotes = [{ url: nats-leaf://127.0.0.1:%d }] }
    `
    lconf2 := createConfFile(t, []byte(fmt.Sprintf(c2, lopts1.Cluster.Port, hc.opts[1].LeafNode.Port)))
    ln2, _ := RunServerWithConfig(lconf2)
    defer ln2.Shutdown()

    c3 := `
    server_name: LEAF3
    listen: 127.0.0.1:-1
    cluster { name: ln22, listen: 127.0.0.1:-1, routes = [ nats-route://127.0.0.1:%d] }
    leafnodes { remotes = [{ url: nats-leaf://127.0.0.1:%d }] }
    `
    lconf3 := createConfFile(t, []byte(fmt.Sprintf(c3, lopts1.Cluster.Port, hc.opts[2].LeafNode.Port)))
    ln3, _ := RunServerWithConfig(lconf3)
    defer ln3.Shutdown()

    // Check leaf cluster is formed and all connected to the HUB.
    lnServers := []*Server{ln1, ln2, ln3}
    checkClusterFormed(t, lnServers...)
    for _, s := range lnServers {
        checkLeafNodeConnected(t, s)
    }
    // Check each node in the hub has 1 connection from the leaf cluster.
    for i := 0; i < 3; i++ {
        checkLeafNodeConnectedCount(t, hc.servers[i], 1)
    }

    // Create a client and qsub on LEAF1 and LEAF2.
    nc1 := natsConnect(t, ln1.ClientURL())
    defer nc1.Close()
    var qsub1Count atomic.Int32
    natsQueueSub(t, nc1, "foo", "queue1", func(_ *nats.Msg) {
        qsub1Count.Add(1)
    })

    nc2 := natsConnect(t, ln2.ClientURL())
    defer nc2.Close()
    var qsub2Count atomic.Int32
    natsQueueSub(t, nc1, "foo", "queue1", func(_ *nats.Msg) {
        qsub2Count.Add(1)
    })

    sendAndCheck := func(idx int) {
        t.Helper()
        nchub := natsConnect(t, hc.servers[idx].ClientURL())
        defer nchub.Close()
        total := 1000
        for i := 0; i < total; i++ {
            natsPub(t, nchub, "foo", []byte("from hub"))
        }
        checkFor(t, time.Second, 15*time.Millisecond, func() error {
            if trecv := int(qsub1Count.Load() + qsub2Count.Load()); trecv != total {
                return fmt.Errorf("Expected %v messages, got %v", total, trecv)
            }
            return nil
        })
        // Now that we have made sure that all messages were received,
        // check that qsub1 and qsub2 are getting at least some.
        if n := int(qsub1Count.Load()); n <= total/10 {
            t.Fatalf("Expected qsub1 to get some messages, but got %v", n)
        }
        if n := int(qsub2Count.Load()); n <= total/10 {
            t.Fatalf("Expected qsub2 to get some messages, but got %v", n)
        }
        // Reset the counters.
        qsub1Count.Store(0)
        qsub2Count.Store(0)
    }
    // Send from HUB1, HUB2 and HUB3
    for i := 0; i < 3; i++ {
        sendAndCheck(i)
    }
}
derekcollison commented 1 month ago

@kozlovic do you observer step #8? Meaning if you publish through HUB1 only only L2 receives?

kozlovic commented 1 month ago

do you observer step https://github.com/nats-io/nats-server/issues/8? Meaning if you publish through HUB1 only only L2 receives?

@derekcollison No, that's my point. The test passes, in that all queue subs receive about the same amount of messages, regardless from where I publish (the test publishes 1000 msgs from HUB1, then check and all ok, then from HUB2, etc...).

roeschter commented 1 month ago

Tested it again, the effect is real BUT to make it reproducible you need connect the leaf nodes exactly as described to the hub. And have clients connect to different leaf node. And publish to the one node on the hub that has a different routing path to the clients on the leafs.

The key to the effect is that the messages needs to travel on two different routes with different number of hops from the hub to the leaf. If the connections are made randomly the effect will be random.

So messages CAN travel either 1) HUB1 --> LEAF1 -> Subscriber1

If the paths are symmetric it will work. 1) HUB1 --> LEAF1 -> Subscriber1 2) HUB1 --> LEAF1 -> Subscriber2 WORKS // Both subscribers on the same leaf node

1) HUB3 --> HUB1 -> LEAF1 -> Subscriber1 2) HUB3 --> HUB2 -> LEAF2 -> Subscriber2 WORKS

kozlovic commented 1 month ago

@roeschter But that's what the test is doing. I make sure that LEAF1 connects to HUB1, LEAF2 to HUB2, etc..and the 2 queue subs are connected to LEAF1 and LEAF2 respectively. The publisher is connected to HUB1, then to HUB2 and HUB3.

Can you share your config files?

kozlovic commented 1 month ago

Oops, just noticed in the test that both queue subs use nc1.. let me fix that and check.

roeschter commented 1 month ago

And when sending you must only send to HUB1!

I know its a bit special - but this does happen in real life. If there is only one client sending requests to the workers,a and its by chance connected in this way the effect happens.

kozlovic commented 1 month ago

And when sending you must only send to HUB1!

Understood. But what the test is doing is to send only from HUB1, then check everything. When that is done, repeat the same test from HUB2, etc.. and then HUB3. To make sure that it works in all cases.

roeschter commented 1 month ago

And when sending you must only send to HUB1!

Understood. But what the test is doing is to send only from HUB1, then check everything. When that is done, repeat the same test from HUB2, etc.. and then HUB3. To make sure that it works in all cases.

Its should fail for HUB1 and HUB2 and succeed for HUB3

kozlovic commented 1 month ago

Ok, I reproduced it, going to check the code but I think that it was intentional to treat LEAF as clients for the message routing. But again, will check code and discuss with @derekcollison if needed.

kozlovic commented 1 month ago

I may have found the place where we override and end up always sending to the route instead to the leaf. Will see if that was intentional or not, or if that breaks lots of tests. But yes, if I "fix" the issue this test at least passes now.

derekcollison commented 1 month ago

ok great @kozlovic thank you!

roeschter commented 1 month ago

User tested with v2.10.22 and found another condition under which the queue groups are imbalanced. The original issue has disappeared.

New problem:

We have been testing NATS server version v2.10.22 to evaluate improvements in load balancing for queue groups over leafnode connections. While the queue groups now seem to work more consistently, we’ve observed an issue when two leaf nodes in the same cluster are connected to the same hub server: only one of the leaf nodes is processing the messages. Issue When two leaf nodes (within the same cluster) connect to the same hub, messages are processed by only one of the leaf nodes, leaving the other idle. Steps to Reproduce To reproduce this issue deterministically, all leaf node and NATS CLI connections are established to specific nodes.

  1. Set up a hub cluster with nodes HUB1, HUB2

  2. Set up a leaf cluster with nodes LEAF1, LEAF2

  3. Connect the leaf nodes as follows:

    1. LEAF1 → HUB1
    2. LEAF2 → HUB1
  4. Start queue group listeners:

    1. nats --context LEAF1 sub --queue=q1 foo
    2. nats --context LEAF2 sub --queue=q1 foo

Publish messages: configs.zip

Send messages to HUB3 nats --context HUB3 pub foo Hello --count 1000 Expected Behavior: Messages should be distributed between subscribers on LEAF1 Actual Behavior: Only the subscriber on LEAF2 receives messages.

kozlovic commented 1 month ago

@roeschter Could you open a new issue and clarify the test? It says to send from HUB3 but there is no mention of it in the setup and also it says that messages should be distributed between subscribers on LEAF1 but there is only 1 queue sub on LEAF1 and 1 on LEAF2 according to the description.

kozlovic commented 1 month ago

@roeschter Also, the configs show 2 sets of LEAF clusters (leafA and leafB), I guess this is not relevant and the issue happens with a single LEAF cluster? (By the way, there are 3 servers in HUB and LEAF, is that required or 2 is enough?)

roeschter commented 1 month ago

OK, reproduced it: See: https://github.com/nats-io/nats-server/issues/6040 It requires a leaf and hub to have each 3 nodes