Replicated stream stop working after a while

mtaobiz commented 9 months ago

Observed behavior

cli:

# nats -s nats://sandbox-nats-v1:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/leaf.creds consumer next tickers ticker
nats: error: no message received: nats: timeout
# nats -s nats://sandbox-nats-v1:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/leaf.creds consumer sub tickers ticker
nats: error: no message received: nats: timeout

nats-server log:

[530521] 2024/02/12 10:09:55.150413 [DBG] 10.1.5.170:58934 - cid:1304 - Authenticated JWT: Client "UDY4FL4HDXTXNGQ35JCP37Q6FWFZOGYYNQAS46CLEDRMCCZJEHNZ5CYC" (claim-name: "leaf", claim-tags: []) signed with "AAIN2HSHPLGMPFDRIFTSBX5VVJWTVOFLGEW6RYAD6ZH2FGNWQULJ4YFR" by Account "AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ" (claim-name: "SANDBOX", claim-tags: []) signed with "OB5R4VZCJKGO232M7WBL45INV2QSHWUODJ3UPGAVN2JT7PXIWKEE7HGZ" has mappings true accused 0xc000192000
[530521] 2024/02/12 10:09:55.151652 [DBG] 10.1.5.170:58934 - cid:1304 - Auto-unsubscribe limit of 1 reached for sid '1'
[530521] 2024/02/12 10:09:57.372254 [DBG] Apply stream entries for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > tickers' got error processing message: msgid is duplicate
[530521] 2024/02/12 10:09:57.448713 [DBG] 10.1.5.170:58934 - cid:1304 - Client Ping Timer
[530521] 2024/02/12 10:09:58.386346 [DBG] Apply stream entries for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > tickers' got error processing message: msgid is duplicate
[530521] 2024/02/12 10:10:00.155106 [DBG] 10.1.5.170:58934 - cid:1304 - Auto-unsubscribe limit of 1 reached for sid '2'
[530521] 2024/02/12 10:10:00.155204 [DBG] 10.1.5.170:58934 - cid:1304 - Auto-unsubscribe limit of 1 reached for sid '3'
[530521] 2024/02/12 10:10:00.159188 [DBG] 10.1.5.170:58934 - cid:1304 - Client connection closed: Client Closed
[530521] 2024/02/12 10:10:01.401973 [DBG] Apply stream entries for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > tickers' got error processing message: msgid is duplicate

"msgid is duplicate" not observed while stream not using replication

Expected behavior

Replicated stream works fine like not replicated

Server and client version

# nats-server -v
nats-server: v2.10.10
# nats --version
0.1.1

App with Go nats library

Host environment

Cluster. 3 nodes of: Debian GNU/Linux 11 (bullseye) AMD Ryzen 7 7700 8-Core Processor (family: 0x19, model: 0x61, stepping: 0x2) 64Gb RAM

Client: Debian GNU/Linux 11 (bullseye) AMD Ryzen 9 3900 12-Core Processor 128Gb RAM

Steps to reproduce

1 create stream:

Information for Stream tickers created 2024-02-01 15:53:23

              Subjects: *.ticker.*.*.ws
              Replicas: 3
               Storage: Memory
     Placement Cluster: sandbox-nats-cluster

Options:

             Retention: Limits
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 10.00s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: 20,000
   Maximum Per Subject: 1
         Maximum Bytes: unlimited
           Maximum Age: unlimited
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Cluster Information:

                  Name: sandbox-nats-cluster
                Leader: nats-node-1
               Replica: nats-node-2, current, seen 14ms ago
               Replica: nats-node-3, current, seen 14ms ago

State:

              Messages: 2,091
                 Bytes: 329 KiB
        First Sequence: 50,438,997 @ 2024-02-12 01:19:46 UTC
         Last Sequence: 91,392,022 @ 2024-02-12 11:55:31 UTC
      Deleted Messages: 40,950,935
      Active Consumers: 1
    Number of Subjects: 2,091

2 Create consumer:

Information for Consumer tickers > ticker created 2024-02-05T11:30:43+01:00

Configuration:

                    Name: ticker
             Description: Ticker update ticker consumer
               Pull Mode: true
          Deliver Policy: New
              Ack Policy: All
                Ack Wait: 2.00s
           Replay Policy: Instant
      Maximum Deliveries: 1
         Max Ack Pending: 4,000
       Max Waiting Pulls: 512

Cluster Information:

                    Name: sandbox-nats-cluster
                  Leader: nats-node-2
                 Replica: nats-node-1, current, seen 272ms ago
                 Replica: nats-node-3, current, seen 272ms ago

State:

  Last Delivered Message: Consumer sequence: 205,214,027 Stream sequence: 545,370,323
    Acknowledgment Floor: Consumer sequence: 205,213,939 Stream sequence: 545,370,234
        Outstanding Acks: 0 out of maximum 4,000
    Redelivered Messages: 0
    Unprocessed Messages: 0
           Waiting Pulls: 1 of maximum 512

3 Post new messages in one app and receive it over consumer in over app within 24-48 hours. 4 Try to get new messages over consumer: nats -s nats://sandbox-nats-v1:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/leaf.creds consumer next tickers ticker

neilalexander commented 9 months ago

Have any of the servers been restarted during this test?

Does it happen with the stream storage set to FileStorage instead of MemoryStorage?

mtaobiz commented 9 months ago

Servers not restarted I dont test with FileStorage. Is it important?

mtaobiz commented 8 months ago

With FileStorage continue to receive errors:

[293333] 2024/02/20 10:39:51.702404 [DBG] RAFT [qh7tjmNM - S-R3F-ErbH4zua] Installing snapshot of 216239 bytes
[293333] 2024/02/20 10:39:54.124782 [DBG] RAFT [qh7tjmNM - C-R3F-0J6szu52] Installing snapshot of 44 bytes
[293333] 2024/02/20 10:39:55.166339 [DBG] Apply stream entries for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > tickers' got error processing message: msgid is duplicate
[293333] 2024/02/20 10:39:57.178840 [DBG] Apply stream entries for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > tickers' got error processing message: msgid is duplicate
[293333] 2024/02/20 10:39:59.185842 [DBG] Apply stream entries for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > tickers' got error processing message: msgid is duplicate

And snapshot size is growing...

derekcollison commented 8 months ago

If you are exercising de-duplication via MsgId then that is ok.

But should not stop your consumer..

What does nats str subjects tickers show?

mtaobiz commented 8 months ago

Currently it shows one message per subject. And 2100 subjects. But currently consumer works fine...

derekcollison commented 8 months ago

And you use msgIds yes?

And you have max per subject of one I see.

There were some improvements that may help in 2.10.11 if you want to upgrade.

mtaobiz commented 7 months ago

Look to consumer sequence and stream sequence numbers in first message. Today problem persist with another stream. And its too has replicas= 3. In new case consumer sequence is much greater than stream last sequence number too. In other streams, what function normally consumer sequence is not greater than stream sequence. And again I can see a lot of "duplicate msgid" errors in nats server log. This new case was with nats version 2.10.10. Looks like replication bug. I will test new nats version later.

mtaobiz commented 6 months ago

New case. And now I have full details. Server version: 2.10.12

2024-05-07 18:10:01.524:

Information for Stream binanceSpotTickerQueue created 2024-03-27 15:32:09

              Subjects: binance.ticker.spot.check
              Replicas: 3
               Storage: Memory
     Placement Cluster: sandbox-nats-cluster

Options:

             Retention: WorkQueue
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 1.00s
     Allows Msg Delete: true
          Allows Purge: true                         
        Allows Rollups: false                        

Limits:                                              

      Maximum Messages: unlimited                    
   Maximum Per Subject: 1                            
         Maximum Bytes: unlimited                    
           Maximum Age: unlimited                    
  Maximum Message Size: unlimited                    
     Maximum Consumers: unlimited                    

Cluster Information:                                 

                  Name: sandbox-nats-cluster
                Leader: nats-node-2
               Replica: nats-node-1, current, seen 477ms ago
               Replica: nats-node-3, current, seen 477ms ago

State:

              Messages: 0
                 Bytes: 0 B
        First Sequence: 360,545
         Last Sequence: 360,544 @ 2024-05-07 18:10:01 UTC

Stream sequence is 360,544 Consumer:

Information for Consumer binanceSpotTickerQueue > ticker created 2024-03-27T15:32:11+01:00

Configuration:

                    Name: ticker
             Description: Binance spot ticker update check consumer
               Pull Mode: true
          Deliver Policy: All
              Ack Policy: Explicit
                Ack Wait: 1m0s
           Replay Policy: Instant
      Maximum Deliveries: 1,000
         Max Ack Pending: 4,000
       Max Waiting Pulls: 512

Cluster Information:

                    Name: sandbox-nats-cluster
                  Leader: nats-node-3
                 Replica: nats-node-1, current, seen 465ms ago
                 Replica: nats-node-2, current, seen 465ms ago

State:

  Last Delivered Message: Consumer sequence: 359,632 Stream sequence: 360,544 Last delivery: 478ms ago
    Acknowledgment Floor: Consumer sequence: 359,632 Stream sequence: 360,544 Last Ack: 465ms ago
        Outstanding Acks: 0 out of maximum 4,000
    Redelivered Messages: 0
    Unprocessed Messages: 0
           Waiting Pulls: 1 of maximum 512

Consumer has stream sequence 360,544

10 min later:

2024-05-07 18:20:01.214:

Information for Stream binanceSpotTickerQueue created 2024-03-27 15:32:09

              Subjects: binance.ticker.spot.check
              Replicas: 3
               Storage: Memory
     Placement Cluster: sandbox-nats-cluster

Options:

             Retention: WorkQueue
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 1.00s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: unlimited
   Maximum Per Subject: 1
         Maximum Bytes: unlimited
           Maximum Age: unlimited
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Cluster Information:

                  Name: sandbox-nats-cluster
                Leader: nats-node-2
               Replica: nats-node-1, current, seen 133ms ago
               Replica: nats-node-3, current, seen 133ms ago

State:

              Messages: 1
                 Bytes: 88 B
        First Sequence: 292 @ 2024-05-07 18:20:01 UTC
         Last Sequence: 292 @ 2024-05-07 18:20:01 UTC
      Active Consumers: 1
    Number of Subjects: 1

At this time stream sequence was 292 Consumer:

Information for Consumer binanceSpotTickerQueue > ticker created 2024-03-27T15:32:11+01:00

Configuration:

                    Name: ticker
             Description: Binance spot ticker update check consumer
               Pull Mode: true
          Deliver Policy: All
              Ack Policy: Explicit
                Ack Wait: 1m0s
           Replay Policy: Instant
      Maximum Deliveries: 1,000
         Max Ack Pending: 4,000
       Max Waiting Pulls: 512

Cluster Information:

                    Name: sandbox-nats-cluster
                  Leader: nats-node-2
                 Replica: nats-node-1, current, seen 804ms ago
                 Replica: nats-node-3, current, seen 804ms ago

State:

  Last Delivered Message: Consumer sequence: 359,927 Stream sequence: 360,839 Last delivery: 5m5s ago
    Acknowledgment Floor: Consumer sequence: 359,927 Stream sequence: 360,839 Last Ack: 5m5s ago
        Outstanding Acks: 0 out of maximum 4,000
    Redelivered Messages: 0
    Unprocessed Messages: 0

But consumer has stream sequence 360839 ! After this consumer stop receiving new messages

Between this time points cluster servers was restarted to new version 2.10.14:

[836983] 2024/05/03 14:04:04.034960 [INF] Starting nats-server
[836983] 2024/05/03 14:04:04.034984 [INF]   Version:  2.10.12

[1029273] 2024/05/07 18:15:10.122406 [INF] Starting nats-server
[1029273] 2024/05/07 18:15:10.122439 [INF]   Version:  2.10.14

log from one of cluster servers about broken stream:

[836983] 2024/05/07 18:14:13.513054 [INF] JetStream cluster new consumer leader for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > binanceSpotTickerQueue > ticker'
[836983] 2024/05/07 18:14:55.015563 [DBG] JetStream cluster, consumer 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > binanceSpotTickerQueue > ticker' was already running
[836983] 2024/05/07 18:14:55.015574 [DBG] JetStream cluster, consumer 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > binanceSpotTickerQueue > ticker' was already running
[836983] 2024/05/07 18:15:08.158077 [DBG] Exiting stream monitor for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > binanceSpotTickerQueue' [S-R3M-im3WIEsG]
[836983] 2024/05/07 18:15:08.195757 [DBG] Exiting consumer monitor for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > binanceSpotTickerQueue > ticker' [C-R3M-rrPUTr3c]
[1029273] 2024/05/07 18:15:10.148156 [DBG] Starting stream monitor for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > binanceSpotTickerQueue' [S-R3M-im3WIEsG]
[1029273] 2024/05/07 18:15:10.148160 [DBG] Waiting for consumers for interest based stream 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > binanceSpotTickerQueue'
[1029273] 2024/05/07 18:15:10.148273 [DBG] Starting consumer monitor for 'AD56GFOTE2TL6A7OYCOF6MW4INPOO44SJF3P3ZVTOJZLCV33J7ZP3UEQ > binanceSpotTickerQueue > ticker' [C-R3M-rrPUTr3c]

mtaobiz commented 6 months ago

# nats -s nats://sandbox-nats-v1:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/leaf.creds consumer next  binanceSpotTickerQueue ticker
nats: error: no message received: nats: timeout

derekcollison commented 6 months ago

How do you update the servers? Do you check /healthz before moving on from one server to another?

mtaobiz commented 6 months ago

We upgrage nodes one by one, and check each node status in systemd after restart.

derekcollison commented 6 months ago

But do you check nats-server healthz to return 200/ok before moving to next server?

mtaobiz commented 6 months ago

no, we don`t call healthz

derekcollison commented 6 months ago

Be a good idea to turn on monitoring and check /healthz before moving onto the next server.

mtaobiz commented 6 months ago

But what if node crashed and restarted? Its possible when two nodes crashed in same time and restarted simultaneously... I think nats server must not perform any data manipulation before it fully started and stay healthy.

derekcollison commented 6 months ago

That is correct and that is what NATS does, but if you start a server and move on to the next before waiting for it to be healthy you can allow the system to elect a leader who has not properly sync'd its state since you are using memory storage.

mtaobiz commented 6 months ago

Using file storage resolves this issue?

derekcollison commented 6 months ago

We still run same process with healthz checks for all out managed systems, but file backed storage will allow a server to recover and sync quicker.

wallyqs commented 2 months ago

Fixed via #5179 as part of v2.10.12 release and memory streams rollouts were improved as part of #5506 on v2.10.17

nats-io / nats-server