nats-io / nats-streaming-server

NATS Streaming System Server
https://nats.io
Apache License 2.0
2.51k stars 283 forks source link

Clients for subscription with max_age and max_inactivity gets no new messages #1152

Open chrisSLDS opened 3 years ago

chrisSLDS commented 3 years ago

Hello,

in our current system setup we repeatedly observed the following behavior.

There are some topics like 'topic.message' where only a few messages are published. Sometimes up to months there are no new updates. It seems like the sequence-number is "reset" to 0(1) again after some time of inactivity and the client remain high(93).

Now after we have published 2 new messages the first and last seq are 1 and 2.

All connected clients stay at the last consumed number (93). I also restarted one client (ClientA)

I hope you can help me.

Best regard Chris

System Setup: Nats Version: 2.1.8 Nats Streaming Version: 0.18.0 Topic: ~30 Clients: ~20

Nats Streaming Config

 store_limits: {
    # Override some global limits
    max_channels: 0
    max_msgs: 0
    max_bytes: 0
    max_age: "720h"
    max_subs: 200
    max_inactivity: "168h"

    channels: {
       "TestLimit": {
          max_subs: 1
          max_age: "60s"
          max_inactivity: "300s"
       }
    }
  }

Nats Streaming Output (url: http://localhost/streaming/channelsz?channel=topic.message&subs=1)

{
    "name": "topic.message",
    "msgs": 2,
    "bytes": 249714,
    "first_seq": 1,
    "last_seq": 2,
    "subscriptions": [
        {
            "client_id": "ClientB",
            "inbox": "_INBOX.bIR8B3ZHorN61QdX8nm8eS",
            "ack_inbox": "_INBOX.dUvpLIDgZu0GpLxyDFdAqE",
            "durable_name": "ClientB",
            "is_durable": true,
            "is_offline": false,
            "max_inflight": 1,
            "ack_wait": 60,
            "last_sent": 93,
            "pending_count": 0,
            "is_stalled": false
        },
        {
            "client_id": "ClientA",
            "inbox": "_INBOX.KArB4VIuy5J5oL2QThXlQH",
            "ack_inbox": "_INBOX.dUvpLIDgZu0GpLxyDFdM01",
            "durable_name": "ClientA",
            "is_durable": true,
            "is_offline": false,
            "max_inflight": 1,
            "ack_wait": 60,
            "last_sent": 93,
            "pending_count": 0,
            "is_stalled": false
        },
        {
            "client_id": "ClientC",
            "inbox": "_INBOX.93fAGvkBQEVlzwUGia0pwy",
            "ack_inbox": "_INBOX.dUvpLIDgZu0GpLxyDFdCJX",
            "queue_name": "q1:q1",
            "is_durable": true,
            "is_offline": true,
            "max_inflight": 1,
            "ack_wait": 30,
            "last_sent": 93,
            "pending_count": 0,
            "is_stalled": false
        },
        {
            "client_id": "ClientD",
            "inbox": "_INBOX.863363d08GrtACKFHa3fcM",
            "ack_inbox": "_INBOX.dUvpLIDgZu0GpLxyDFdDzB",
            "queue_name": "q2:q2",
            "is_durable": true,
            "is_offline": false,
            "max_inflight": 2,
            "ack_wait": 30,
            "last_sent": 93,
            "pending_count": 0,
            "is_stalled": false
        },
               ...
    ]
}
kozlovic commented 3 years ago

With max_inactivity, it is not surprising that if a channel is deleted, and later recreated, the sequence of the first message will start at 1. What it is surprising, is that a channel should not be deleted if there is active consumers. I know that I fixed some issues with channels being deleted/recreated, in the context of clustering, but this was fixed in 0.18.0, which is what you are using. Maybe the state existed prior to you upgrading to 0.18.0?

Some questions:

If you were not running with debug, could you re-run with debug so that maybe we capture the events that lead to this situation?

For now, if you want all your consumers to start receiving again (the 2 messages currently in the channel), you would have to unsubscribe them and restart them with the "deliver all" subscription option.

Also, I would recommend that you upgrade to the latest v0.20.0 release as time permits.

chrisSLDS commented 3 years ago

Hello,

With max_inactivity, it is not surprising that if a channel is deleted, and later recreated, the sequence of the first message will start at 1. What it is surprising, is that a channel should not be deleted if there is active consumers.

with this assumption I totally agree.

Some questions:

  • Are you using clustering?

Not for streaming, only for NATS

  • Are you running the server with debug (command line -SD or sd: true in the streaming{} configuration block)

YES: sd:true

  • Do you have any server log to share?

Out services run in Azure Kubernetes Container and the log is captured by elastic. Up to now the considered range was 30d. I just had it changed to 60 days. As soon as the 30 days (beginning on 4.2.2021) I will send you parts of the log.

I think the deliver all option is not expedient for me. Because not all of the services are managed by me. Furthermore I have to change them back...

Is there a possibility to change the "LastSend" - Items to 0 or 1.... in the database/by API or something like that.

We will update the system as soon as possible to 0.20.0

kozlovic commented 3 years ago

I think the deliver all option is not expedient for me. Because not all of the services are managed by me. Furthermore I have to change them back...

What start option are they using? What I was saying was to stop the apps, have them unsubscribe (which will delete the durable subscription) and then start it again. I assumed that they are using deliver all, if not, let me know, but for durables, the start sequence has meaning only the first time the durable is created, so I would be surprised if they use something different than deliver all.

Is there a possibility to change the "LastSend" - Items to 0 or 1.... in the database/by API or something like that.

No, the only way is as I described above, unsubscribe and restart them.