Subscribing to JetStream Stream on a 3 node cluster with a single offline node fails 1 in 3 times

rubencosta commented 1 month ago

Observed behavior

The subscription fails with a timeout about ~33% of the tries. This is unexpected because the cluster should still be functional in the case of a single node failure. From the amount of failures and the fact that removing the server from the cluster fixes the issue, I assume that NATS Server is routing the request internally to any of the cluster nodes even if the node is reported as offline.

nats subscribe test --all --trace
16:46:01 Subscribing to JetStream Stream holding messages with subject test starting with the first message received 
nats: error: context deadline exceeded

Expected behavior

I expect the subscription to work 100% of the tries since the cluster should still be functional in the case of a single node failure.

nats subscribe test --all
16:12:30 Subscribing to JetStream Stream holding messages with subject test starting with the first message received 
[#1] Received JetStream message: stream: test seq 1 / subject: test / time: 2024-06-06T16:07:50+02:00
test

Server and client version

nats-server: v2.10.16 nats: v0.1.4

Host environment

No response

Steps to reproduce

Create a 3 node cluster with JetStream enabled

nats-server -js -p 4222 --server_name a --cluster nats://localhost:4248 --cluster_name test-cluster --routes nats://localhost:4249

nats-server -js -p 4223 --server_name b --cluster nats://localhost:4249 --cluster_name test-cluster --routes nats://localhost:4250

nats-server -js -p 4224 --server_name c --cluster nats://localhost:4250 --cluster_name test-cluster --routes nats://localhost:4248

Create a test stream

nats stream add test --replicas 3
? Subjects test
? Storage memory
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Per Subject Messages Limit -1
? Total Stream Size -1
? Message TTL -1
? Max Message Size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
Stream test was created

Information for Stream test created 2024-06-06 16:32:28

              Subjects: test
              Replicas: 3
               Storage: Memory

Options:

             Retention: Limits
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 2m0s
            Direct Get: true
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: unlimited
   Maximum Per Subject: unlimited
         Maximum Bytes: unlimited
           Maximum Age: unlimited
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Cluster Information:

                  Name: test-cluster
                Leader: a
               Replica: b, current, seen 1ms ago
               Replica: c, current, seen 647µs ago

State:

              Messages: 0
                 Bytes: 0 B
        First Sequence: 0
         Last Sequence: 0
      Active Consumers: 0

Publish a test message

nats publish test test
16:33:26 Published 4 bytes to "test"

Create a subscription and observe that it always works

16:34:10 Subscribing to JetStream Stream holding messages with subject test starting with the first message received 
[#1] Received JetStream message: stream: test seq 1 / subject: test / time: 2024-06-06T16:33:26+02:00
test

Stop any of the servers, create a subscription and observe a timeout

nats subscribe test --all --trace
16:46:01 Subscribing to JetStream Stream holding messages with subject test starting with the first message received 
nats: error: context deadline exceeded

ripienaar commented 1 month ago

Please show the nats commands with --trace set for the ones that fail

thank you

rubencosta commented 1 month ago

Please show the nats commands with --trace set for the ones that fail

thank you

Thanks @ripienaar, I have updated the description although there were no extra logs.

ripienaar commented 1 month ago

Hmm, ok, thats a pity - essential to figure it out, ok I'll try to reproduce and add some debug info. Will move to CLI repo for now

rubencosta commented 1 month ago

@ripienaar I should add that we have first observed this in our app code using the go client with a cluster deployed in Kubernetes. I used nats cli for the simplest reproduction case I could manage.

ripienaar commented 1 month ago

Sample go code would be good also - nats uses its own client library to it might be a bit weird if I have bugs hehe...but I'll look into it anyway.

derekcollison commented 1 month ago

Currently the meta layer requires all peers to be online to place. That information is updated async, so could be that meta layer did not process the offline state when it allows.

We have had some requests from customers that would allow for peer selection with just quorum semantics..

rubencosta commented 1 month ago

@derekcollison Not sure if I understand what you mean, but maybe this helps.

Here we can see that after stopping the node c, the state is correctly reported in the RAFT Meta Group Information while still being able to reproduce the issue.

╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                           JetStream Summary                                          │
├────────┬──────────────┬─────────┬───────────┬──────────┬───────┬────────┬──────┬─────────┬───────────┤
│ Server │ Cluster      │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err   │
├────────┼──────────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼───────────┤
│ a      │ test-cluster │ 1       │ 0         │ 1        │ 24 B  │ 24 B   │ 0 B  │ 0       │ 0         │
│ b*     │ test-cluster │ 1       │ 0         │ 1        │ 24 B  │ 24 B   │ 0 B  │ 8       │ 1 / 12.5% │
├────────┼──────────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼───────────┤
│        │              │ 2       │ 0         │ 2        │ 48 B  │ 48 B   │ 0 B  │ 8       │ 1         │
╰────────┴──────────────┴─────────┴───────────┴──────────┴───────┴────────┴──────┴─────────┴───────────╯

╭───────────────────────────────────────────────────────────────────────╮
│                      RAFT Meta Group Information                      │
├─────────────────┬──────────┬────────┬─────────┬────────┬────────┬─────┤
│ Connection Name │ ID       │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────┼──────────┼────────┼─────────┼────────┼────────┼─────┤
│ a               │ GR5IGR3G │        │ true    │ true   │ 97ms   │ 0   │
│ b               │ 0ZkM0vRC │ yes    │ true    │ true   │ 0s     │ 0   │
│ c               │ k1i3jIye │        │ false   │ false  │ 19.10s │ 1   │
╰─────────────────┴──────────┴────────┴─────────┴────────┴────────┴─────╯

❯ 
nats subscribe test --all --trace
19:09:22 Subscribing to JetStream Stream holding messages with subject test starting with the first message received 
<<< Reply Subject: $JS.ACK.test.yEK7pGyN.1.1.1.1717693739474051672.0
[#1] Received JetStream message: stream: test seq 1 / subject: test / time: 2024-06-06T19:08:59+02:00
test

^C⏎                                                                                          

❯ 
nats subscribe test --all --trace
19:09:26 Subscribing to JetStream Stream holding messages with subject test starting with the first message received 
nats: error: context deadline exceeded

derekcollison commented 1 month ago

Is the stream a interest or workqueue stream?

Its trying to create a consumer and I would assume its asking for R3 and can't see all peers online.

Not sure if nats cli would accept it here, but try --replicas 1.

If interest or WQ it requires same replica count. We are adding in a viewer that bypasses consumers and uses direct gets. @ripienaar has additional information and timing on that.

ripienaar commented 1 month ago

Only nats s view will use direct get, for sub its always a subscription. So I think Derek has nailed the reason for this

rubencosta commented 1 month ago

Thanks @ripienaar and @derekcollison, it seems like we we're still using the deprecated subscribe API and in the process of trying to reproduce the bug I ended up using the nats sub which also suffers from the same issue. I can confirm that using an ordered consumer works 100% of the time.

nats-io / natscli