nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.49k stars 1.38k forks source link

Message lost on pushing to leaf stream with connected hub stream when hub offline #4148

Closed simonhoss closed 1 year ago

simonhoss commented 1 year ago

Defect

Make sure that these boxes are checked before submitting your issue -- thank you!

Versions of nats-server and affected client libraries used:

nats-server v.2.9.16 nats-cli 0.0.35

OS/Container environment:

MacOS native executable

Steps or code to reproduce the issue:

Configs:

hub.conf

port: 4222
server_name: hub-server
jetstream {
    store_dir="./store_server"
    domain=hub
}
leafnodes {
    port: 7422
}

leaf.conf

server_name: store_a

port: 4111

leafnodes {
  remotes = [
    {
      url: "nats://localhost:7422"
    }
  ]
}

jetstream {
}

I connect a stream on the leaf with the following command

nats -s nats://localhost:4111 s add --source orders

? Stream Name orders
? Storage file
? Replication 1
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Total Stream Size -1
? Message TTL -1
? Max Message Size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
? Adjust source "orders" start No
? orders Source Filter source by subject orders.store_1.foo
? Import "orders" from a different JetStream domain Yes
? orders Source foreign JetStream domain name hub
? orders Source foreign JetStream domain delivery prefix
Stream orders was created

Information for Stream orders created 2023-05-11 00:27:31

             Replicas: 1
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Replication:

              Sources: orders, Subject: orders.store_1.foo, API Prefix: $JS.hub.API

State:

             Messages: 0
                Bytes: 0 B
             FirstSeq: 0
              LastSeq: 0

I push a a new message with the following command and the hub is offline

nats pub -s nats://localhost:4111 orders.store_1.foo order1_22

Expected result:

Actual result:

Log of the leaf

[29197] 2023/05/11 09:18:05.418112 [DBG] Trying to connect as leafnode to remote server on "localhost:7422" (127.0.0.1:7422)
[29197] 2023/05/11 09:18:05.418616 [ERR] Error trying to connect as leafnode to remote server "localhost:7422" (attempt 16): dial tcp 127.0.0.1:7422: connect: connection refused
[29197] 2023/05/11 09:18:06.292591 [DBG] 127.0.0.1:57313 - cid:9 - Client connection created
[29197] 2023/05/11 09:18:06.293682 [TRC] 127.0.0.1:57313 - cid:9 - <<- [CONNECT {"verbose":false,"pedantic":false,"tls_required":false,"name":"NATS CLI Version 0.0.35","lang":"go","version":"1.19.0","protocol":1,"echo":true,"headers":true,"no_responders":true}]
[29197] 2023/05/11 09:18:06.293709 [TRC] 127.0.0.1:57313 - cid:9 - <<- [PING]
[29197] 2023/05/11 09:18:06.293712 [TRC] 127.0.0.1:57313 - cid:9 - ->> [PONG]
[29197] 2023/05/11 09:18:06.294602 [TRC] 127.0.0.1:57313 - cid:9 - <<- [PUB orders.store_1.foo 9]
[29197] 2023/05/11 09:18:06.294607 [TRC] 127.0.0.1:57313 - cid:9 - <<- MSG_PAYLOAD: ["order1_23"]
[29197] 2023/05/11 09:18:06.294611 [TRC] 127.0.0.1:57313 - cid:9 - <<- [PING]
[29197] 2023/05/11 09:18:06.294614 [TRC] 127.0.0.1:57313 - cid:9 - ->> [PONG]
[29197] 2023/05/11 09:18:06.294763 [DBG] 127.0.0.1:57313 - cid:9 - Client connection closed: Client Closed
[29197] 2023/05/11 09:18:06.419746 [DBG] Trying to connect as leafnode to remote server on "172.20.10.3:7422"
[29197] 2023/05/11 09:18:06.420191 [ERR] Error trying to connect as leafnode to remote server "172.20.10.3:7422" (attempt 17): dial tcp 172.20.10.3:7422: connect: connection refused
[29197] 2023/05/11 09:18:07.183959 [TRC] JETSTREAM - <-> [DELSUB 9]
[29197] 2023/05/11 09:18:07.195137 [TRC] JETSTREAM - <-> [DELSUB 10]
derekcollison commented 1 year ago

Your leafnode should have a different and unique domain set then the hub, try

jetstream {
  domain: leaf
}
simonhoss commented 1 year ago

Thanks for your response. I tried it as you suggested. Here is the startup log:

[52728] 2023/05/11 16:29:03.775260 [INF] Starting nats-server
[52728] 2023/05/11 16:29:03.775409 [INF]   Version:  2.9.16
[52728] 2023/05/11 16:29:03.775411 [INF]   Git:      [f84ca24]
[52728] 2023/05/11 16:29:03.775414 [INF]   Cluster:  store_a
[52728] 2023/05/11 16:29:03.775415 [INF]   Name:     store_a
[52728] 2023/05/11 16:29:03.775417 [INF]   Node:     pbNdm6o3
[52728] 2023/05/11 16:29:03.775418 [INF]   ID:       NCBZWL3DIXH5JZ67OAXADF6TGXMKWCDSSUOB27L4HBVETRATIWUK2WQ7
[52728] 2023/05/11 16:29:03.775441 [INF] Using configuration file: leaf.conf
[52728] 2023/05/11 16:29:03.777592 [INF] Starting JetStream
[52728] 2023/05/11 16:29:03.778045 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[52728] 2023/05/11 16:29:03.778050 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[52728] 2023/05/11 16:29:03.778199 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[52728] 2023/05/11 16:29:03.778203 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[52728] 2023/05/11 16:29:03.778205 [INF]
[52728] 2023/05/11 16:29:03.778206 [INF]          https://docs.nats.io/jetstream
[52728] 2023/05/11 16:29:03.778207 [INF]
[52728] 2023/05/11 16:29:03.778212 [INF] ---------------- JETSTREAM ----------------
[52728] 2023/05/11 16:29:03.778215 [INF]   Max Memory:      12.00 GB
[52728] 2023/05/11 16:29:03.778217 [INF]   Max Storage:     21.24 GB
[52728] 2023/05/11 16:29:03.778219 [INF]   Store Directory: "store_a_server/jetstream"
[52728] 2023/05/11 16:29:03.778220 [INF]   Domain:          store_a_leaf

As you see I created a domain store_a_leaf

When I push a new message with nats pub orders.store_1.foo order1_03 --context leaf in a connected state everything works as expected, the msg appears in both nats servers.

When I push it in disconnected state the message gets completely lost.

derekcollison commented 1 year ago

Could you share the following?

nats stream info for both accounts on leaf and hub for the streams in question?

simonhoss commented 1 year ago

Here we go

For the hub:

❯ nats stream info
? Select a Stream orders
Information for Stream orders created 2023-05-11 18:46:45

             Subjects: orders.>
             Replicas: 1
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

State:

             Messages: 1
                Bytes: 57 B
             FirstSeq: 1 @ 2023-05-11T16:49:41 UTC
              LastSeq: 1 @ 2023-05-11T16:49:41 UTC
     Active Consumers: 0
   Number of Subjects: 1

For the leaf

❯ nats stream info --context leaf
? Select a Stream orders_leaf
Information for Stream orders_leaf created 2023-05-11 18:49:35

             Replicas: 1
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Replication:

              Sources: orders, Subject: orders.store_1.foo, API Prefix: $JS.hub.API

Cluster Information:

                 Name: store_a
               Leader: store_a

Source Information:

          Stream Name: orders
                  Lag: 0
            Last Seen: 0.89s
      Ext. API Prefix: $JS.hub.API

State:

             Messages: 1
                Bytes: 112 B
             FirstSeq: 1 @ 2023-05-11T16:49:41 UTC
              LastSeq: 1 @ 2023-05-11T16:49:41 UTC
     Active Consumers: 0
   Number of Subjects: 1
derekcollison commented 1 year ago

The source on the leafnode does not listen to the subject on its own and only gets messages when they are present in the hub.

You can add the subject orders.> to the source but I don't think that is what you are trying to achieve.

Can you explain at a high level what you want accomplish? When messages are published in the hub vs on the leafnode and both situations when the leafnode connection is down?

simonhoss commented 1 year ago

Gotcha.

What I want to achieve is a scenario where I have a central (hub) and some leafs which are on edge devices. The hub can push infos to the edge device and the edge back to the hub. This works perfectly when the --source is set and both systems are online. But the problem is I cannot guarantee a stable connection. So both of them should cache the not send messages and as soon as the connection is back, it should send in both directions all the not send message. In any case the messages should be persisted regardless the connection state between the leaf and the hub.

derekcollison commented 1 year ago

Common pattern.

On leaf have the stream listen to subjects like EVENT.LEAF-1.> On hub have it listen on EVENT.HUB.>

Then source the leafnode stream from the hub using filter subject EVENT.HUB.> So it only pulls hub messages. On HUB have it source leafnode with EVENT.LEAF-1.> so it only pulls local messages into the HUB.

If you plan to have alot of leafnodes this may start to have scaling issues on the stream hub, but we have worked with customers to architect solutions for that as well.

simonhoss commented 1 year ago

I also started to think in that way. Is there a possibility that I can then push the messages into a common stream on each node? Maybe with RePublish?

See picture as example:

image

derekcollison commented 1 year ago

You could put RePublish is NATS core, so message loss can happen. Again we have designed systems for customers since these can be complex at large scale.

simonhoss commented 1 year ago

With the republish I meant a way of collecting the hub main stream and the source streams from the leaf. I want that all messages are again pushed into stream a where I can subscribe and get all messages on the hub including the leafs.

I solved it now by creating another stream on the hub with the sources of the other streams, which again are sourced to the dedicated leafs.

Thanks a lot for your help. I think I can now go forward with this approach!

derekcollison commented 1 year ago

I accounted for that when I said have the HUB source from all leafs for their filtered subjects.

derekcollison commented 1 year ago

See above comment.