nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.64k stars 1.4k forks source link

Messages stop being delivered to a consumer [v2.9.22, v2.10.14] #4736

Open hpdobrica opened 11 months ago

hpdobrica commented 11 months ago

Observed behavior

A couple of weeks ago, one of consumers on one of our streams started misbehaving - after receiving an influx of messages that the consumer should process, the messages end up in pending state for this consumer (nats_consumer_num_pending), form where the consumer processes them. After processing some of the messages, all of a sudden the messages dissapear from pending and the consumer is unable to continue processing them.

The unprocessed messages still take up space on the stream, and will not be processed unless we delete and recreate the affected consumer. When we recreate it, the consumer continues working normally for 2-30mins, after which the problem usually happens again. We can recreate the consumer multiple times and process all messages in this way. While the messages are "stuck", new messages for the same consumer end up in the pending queue and get processed normally.

The first time that this happened, our stream had only one replica, and that nats node was restarted (or it crashed) - the problems started appearing shortly after that restart. Restarting the nats node on which stream lives fixes the issue temporarily (but loses the "stuck" messages). Scaling the stream up to have 3 replicas also fixes the issue temporarily, but loses the stuck messages.

Yesterday we tried recreating the stream, but the problem occured again even though there were no nats node restarts like the first time that the issue happened.

This issue happened in 4 of our production environments, in each one affecting different consumers (although affected consumers are staying consistent across environments). On 3 of these envs the problem didn't happen again for the last week (action taken was scaling up to 3 replicas and restarting nats nodes), but on one of the envs it keeps happening every 1-2 days around the same time (when this consumer has a big influx of messages).

Note that while one consumer gets stuck with most messages, there are also other consumers that sometimes get stuck together with it, but with much less messages.

edit: It's also important to note that this is happening only on a single stream, while all others are operating normally

Some details about the stream and affected consumer:

> nats stream info affected-stream
Information for Stream affected-stream created 2023-11-01 15:34:56

             Subjects: affected.stream.>
             Replicas: 3
              Storage: File

Options:

            Retention: WorkQueue
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: 2.0 GiB
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Cluster Information:

                 Name: nats
               Leader: nats-2
              Replica: nats-0, current, seen 0.23s ago
              Replica: nats-4, current, seen 0.23s ago

State:

             Messages: 18,072
                Bytes: 247 MiB
             FirstSeq: 128,046 @ 2023-11-02T11:21:29 UTC
              LastSeq: 154,415 @ 2023-11-02T15:31:44 UTC
     Deleted Messages: 8,298
     Active Consumers: 29
   Number of Subjects: 2
> nats consumer info affected-stream affected-consumer
Information for Consumer affected-stream > affected-consumer created 2023-11-01T15:34:59Z

Configuration:

                Name: affected-consumer
           Pull Mode: true
      Filter Subject: subject.processed.by.affected.consumer
      Deliver Policy: All
          Ack Policy: Explicit
            Ack Wait: 30s
       Replay Policy: Instant
     Max Ack Pending: 800
   Max Waiting Pulls: 512

Cluster Information:

                Name: nats
              Leader: nats-2
             Replica: nats-0, current, seen 0.22s ago
             Replica: nats-4, current, seen 0.22s ago

State:

   Last Delivered Message: Consumer sequence: 5,319 Stream sequence: 154,415 Last delivery: 1h26m0s ago
     Acknowledgment floor: Consumer sequence: 5,319 Stream sequence: 154,395 Last Ack: 1h25m59s ago
         Outstanding Acks: 0 out of maximum 800
     Redelivered Messages: 0
     Unprocessed Messages: 0
            Waiting Pulls: 2 of maximum 512
> nats stream subjects affected-stream
subject.processed.by.some.other.stuck.consumer: 4
subject.processed.by.affected.consumer: 18,068

Expected behavior

All messages intended for a consumer are delivered to it until the stream is empty, there is no need for consumer recreation in order to process messages.

Server and client version

server: nats jetstream 2.9.22 client: github.com/nats-io/nats.go v1.28.0

Host environment

both client and server run in KOPS kubernetes on nodes with:

Steps to reproduce

first time it happened it was right after nats node restart while stream had only 1 replica, tried reproducing it on another environment by killing a node on which the 1-replica stream lives, but didn't manage to do so

Jarema commented 10 months ago

Could you please check if latest 2.10 resolves this issue?

electronick commented 9 months ago

we have the similar problem on the latest nats v2.10.7 (we use debian 12). It's happens rare, but symptoms are quite similar. Consumer just stops delivering messages to our go app. NATS log has the following weird messages:

 [1654008] 2023/12/22 14:13:19.892668 [WRN] RAFT [XODqRg5S - C-R3M-1kwYOIEY] 15000 append entries pending
[1654008] 2023/12/22 14:13:20.643050 [WRN] RAFT [XODqRg5S - C-R3M-1kwYOIEY] 20000 append entries pending
[1654008] 2023/12/22 14:13:21.412309 [WRN] RAFT [XODqRg5S - C-R3M-1kwYOIEY] 25000 append entries pending
[1654008] 2023/12/22 14:13:22.170486 [WRN] RAFT [XODqRg5S - C-R3M-1kwYOIEY] 30000 append entries pending
[1654008] 2023/12/22 14:13:39.033362 [WRN] RAFT [XODqRg5S - C-R3M-1kwYOIEY] 15000 append entries pending
[1654008] 2023/12/22 14:13:54.193566 [INF] JetStream cluster new consumer leader for '$G > request > requests_parser'
[1654008] 2023/12/22 14:16:51.041239 [WRN] Consumer '$G > request > requests_parser' error on store update from snapshot entry: old update ignored
[1654008] 2023/12/22 14:17:02.518672 [WRN] Consumer '$G > request > requests_parser' error on store update from snapshot entry: old update ignored
[1654008] 2023/12/22 14:17:35.196894 [WRN] RAFT [XODqRg5S - C-R3M-1kwYOIEY] Resetting WAL state

on all 3 servers working in a cluster we got a lot of messages like this:

[1694318] 2023/12/22 10:00:42.601269 [WRN] Consumer '$G > request > requests_parser' error on store update from snapshot entry: old update ignored

and around the time consumer stopped processing:

[1694318] 2023/12/22 10:01:09.371145 [WRN] RAFT [SDc6tucJ - C-R3M-1kwYOIEY] Resetting WAL state

before consumer stopped on the metadata leader we see a huge amount of messages like this:

[1654008] 2023/12/22 04:32:59.040890 [WRN] JetStream consumer '$G > filters > filter_durable_5113f4947801113764000004_7ff972fe838acb10d470a0fc4f290b23391b6736d109771b76c42e1edf6fb0d5' is not current
[1654008] 2023/12/22 04:33:17.206532 [WRN] JetStream cluster consumer '$G > filters > filter_durable_5113f4947801113764000004_7ff972fe838acb10d470a0fc4f290b23391b6736d109771b76c42e1edf6fb0d5' has NO quorum, stalled.

however this consumer is from another stream filters, but not request and this consumer (filter_durable_... should have been already deleted as it should only live 30 minutes by our design

after restarting all nats servers (in sequence) we see the following messages on the leader:

[1659980] 2023/12/22 15:10:21.981599 [INF] JetStream cluster new metadata leader: NATS-C01/us-east-1
[1659980] 2023/12/22 15:10:23.358404 [INF] JetStream cluster new stream leader for '$G > filters'
[1659980] 2023/12/22 15:10:23.401515 [WRN] Catchup for stream '$G > filter_tickers' stalled
[1659980] 2023/12/22 15:10:23.401590 [WRN] Error applying entries to '$G > filter_tickers': first sequence mismatch
[1659980] 2023/12/22 15:10:23.404622 [WRN] Resetting stream cluster state for '$G > filter_tickers'
[1659980] 2023/12/22 15:10:23.593914 [INF] Catchup for stream '$G > filters' complete
[1659980] 2023/12/22 15:10:23.801614 [WRN] RAFT [XODqRg5S - C-R3M-1kwYOIEY] Resetting WAL state
[1659980] 2023/12/22 15:10:23.801640 [ERR] RAFT [XODqRg5S - C-R3M-1kwYOIEY] Error sending snapshot to follower [gVDiArbz]: raft: no snapshot available
[1659980] 2023/12/22 15:10:23.954231 [INF] JetStream cluster new consumer leader for '$G > filters > filter_durable_5113f4947801113764000004_198c06f0ed41b3cbae6d05fe94dbb01c923407f1788dc61a755ae2631c6b2f39'
[1659980] 2023/12/22 15:10:23.955082 [WRN] RAFT [XODqRg5S - C-R3M-iNvxvCBK] Resetting WAL state
[1659980] 2023/12/22 15:10:23.955101 [ERR] RAFT [XODqRg5S - C-R3M-iNvxvCBK] Error sending snapshot to follower [gVDiArbz]: raft: no snapshot available
[1659980] 2023/12/22 15:10:25.008382 [WRN] Catchup for stream '$G > request' stalled
[1659980] 2023/12/22 15:10:25.009105 [WRN] Error applying entries to '$G > request': first sequence mismatch
[1659980] 2023/12/22 15:10:30.645400 [INF] JetStream cluster new consumer leader for '$G > filters > filter_durable_5113f4947801113764000004_198c06f0ed41b3cbae6d05fe94dbb01c923407f1788dc61a755ae2631c6b2f39'
[1659980] 2023/12/22 15:11:19.353377 [WRN] JetStream cluster consumer '$G > request > requests_parser' has NO quorum, stalled.
[1659980] 2023/12/22 15:11:19.353569 [WRN] Consumer '$G > request > requests_parser' error on store update from snapshot entry: old update ignored
[1659980] 2023/12/22 15:11:19.372417 [WRN] Resetting stream cluster state for '$G > request'
[1659980] 2023/12/22 15:12:58.144712 [INF] JetStream cluster new consumer leader for '$G > request > requests_parser'
[1659980] 2023/12/22 15:13:08.287446 [INF] JetStream cluster new consumer leader for '$G > request > requests_parser'
[1659980] 2023/12/22 15:15:55.507108 [INF] JetStream cluster new consumer leader for '$G > request > requests_parser'
[1659980] 2023/12/22 15:17:20.360734 [WRN] Consumer '$G > request > requests_parser' error on store update from snapshot entry: old update ignored
[1659980] 2023/12/22 15:18:11.384037 [INF] JetStream cluster new consumer leader for '$G > request > requests_parser'
[1659980] 2023/12/22 15:18:38.263053 [INF] JetStream cluster new consumer leader for '$G > request > requests_parser'
[1659980] 2023/12/22 15:18:51.592069 [INF] JetStream cluster new consumer leader for '$G > request > requests_parser'
[1659980] 2023/12/22 15:23:21.971901 [INF] JetStream cluster new consumer leader for '$G > request > requests_parser'
[1659980] 2023/12/22 15:23:59.380268 [WRN] Consumer '$G > request > requests_parser' error on store update from snapshot entry: old update ignored
[1659980] 2023/12/22 15:24:32.491818 [WRN] Consumer '$G > request > requests_parser' error on store update from snapshot entry: old update ignored
[1659980] 2023/12/22 15:27:17.708637 [WRN] Consumer '$G > request > requests_parser' error on store update from snapshot entry: old update ignored

as an additional info: stream filter has memory storage filterdurable consumer is created with options:

        AckPolicy:         jetstream.AckExplicitPolicy,
        FilterSubject:    "filter.filter_id.>",
        HeadersOnly:       true,
        InactiveThreshold: 30 * time.Minute,

stream request has file store, however requests_parser is created with options:

        Durable: "requests_parser",
        MemoryStorage: true,
        AckPolicy: jetstream.AckExplicitPolicy,

and we use jetstream.Consumer#Consume for retrieving messages. from time to time some apps just stops to receive messages, but there are no any connection errors or something.

in nats error.log there are errors like:

goroutine 85342147 [sync.Cond.Wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:398 +0xce fp=0xc000385ec0 sp=0xc000385ea0 pc=0x439fae
runtime.goparkunlock(...)
    runtime/proc.go:404
sync.runtime_notifyListWait(0xc012682090, 0x3a8)
    runtime/sema.go:527 +0x159 fp=0xc000385f10 sp=0xc000385ec0 pc=0x468dd9
sync.(*Cond).Wait(0xc061e81980?)
    sync/cond.go:70 +0x85 fp=0xc000385f50 sp=0xc000385f10 pc=0x483ae5
github.com/nats-io/nats-server/v2/server.(*client).writeLoop(0xc061e81980)
    github.com/nats-io/nats-server/v2/server/client.go:1205 +0x1f1 fp=0xc000385f98 sp=0xc000385f50 pc=0x7b5391
github.com/nats-io/nats-server/v2/server.(*Server).createClientEx.func2()
    github.com/nats-io/nats-server/v2/server/server.go:3220 +0x17 fp=0xc000385fb0 sp=0xc000385f98 pc=0x9b9f37
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1()
    github.com/nats-io/nats-server/v2/server/server.go:3700 +0x32 fp=0xc000385fe0 sp=0xc000385fb0 pc=0x9bcdd2
runtime.goexit()
    runtime/asm_amd64.s:1650 +0x1 fp=0xc000385fe8 sp=0xc000385fe0 pc=0x46ca01
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine in goroutine 85342130
    github.com/nats-io/nats-server/v2/server/server.go:3698 +0x145

goroutine 85349642 [runnable]:
runtime.gopark(0x1?, 0xb?, 0x0?, 0x0?, 0x48?)
    runtime/proc.go:398 +0xce fp=0xc00073bb50 sp=0xc00073bb30 pc=0x439fae
runtime.netpollblock(0x4a5198?, 0x404b06?, 0x0?)
    runtime/netpoll.go:564 +0xf7 fp=0xc00073bb88 sp=0xc00073bb50 pc=0x432a57
internal/poll.runtime_pollWait(0x7f10efcdc468, 0x72)
    runtime/netpoll.go:343 +0x85 fp=0xc00073bba8 sp=0xc00073bb88 pc=0x467185
internal/poll.(*pollDesc).wait(0xc01030b780?, 0xc0026c6c00?, 0x0)
    internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00073bbd0 sp=0xc00073bba8 pc=0x4bdc47
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc01030b780, {0xc0026c6c00, 0x400, 0x400})
    internal/poll/fd_unix.go:164 +0x27a fp=0xc00073bc68 sp=0xc00073bbd0 pc=0x4beb5a
net.(*netFD).Read(0xc01030b780, {0xc0026c6c00?, 0xc00073bd18?, 0x4b1fee?})
    net/fd_posix.go:55 +0x25 fp=0xc00073bcb0 sp=0xc00073bc68 pc=0x582285
net.(*conn).Read(0xc07d2005a0, {0xc0026c6c00?, 0x175bf0bfc3c2f?, 0x1053e20?})
    net/net.go:179 +0x45 fp=0xc00073bcf8 sp=0xc00073bcb0 pc=0x58d5a5
net.(*TCPConn).Read(0xc004c3e000?, {0xc0026c6c00?, 0x1053e20?, 0x800?})
    <autogenerated>:1 +0x25 fp=0xc00073bd28 sp=0xc00073bcf8 pc=0x59a785
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc004c3e000, {0x0, 0x0, 0x0})
    github.com/nats-io/nats-server/v2/server/client.go:1336 +0x642 fp=0xc00073bf80 sp=0xc00073bd28 pc=0x7b5e02
github.com/nats-io/nats-server/v2/server.(*Server).createClientEx.func1()
    github.com/nats-io/nats-server/v2/server/server.go:3217 +0x25 fp=0xc00073bfb0 sp=0xc00073bf80 pc=0x9b9f85
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1()
    github.com/nats-io/nats-server/v2/server/server.go:3700 +0x32 fp=0xc00073bfe0 sp=0xc00073bfb0 pc=0x9bcdd2
runtime.goexit()
    runtime/asm_amd64.s:1650 +0x1 fp=0xc00073bfe8 sp=0xc00073bfe0 pc=0x46ca01
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine in goroutine 85349641
    github.com/nats-io/nats-server/v2/server/server.go:3698 +0x145
Jarema commented 7 months ago

EDIT: Rephrasing, as you mentioned you're using push:

Are you having any custom config for Consume method, or you're using the defaults?

AetheWu commented 7 months ago

We encountered the same problem on nats-v2.10.7

Jarema commented 7 months ago

@AetheWu can you share some info - consumer info and stream info? Also any additional context would help too: did it happen after some of the servers restarted? Was there a network issue? Did it happen after an upgrade? Please share any additional information that could help us replicate this issue.

AetheWu commented 7 months ago

nats server and client version

server: v2.10.7 client: github.com/nats-io/nats.go v1.32.0

nats config

listen: 0.0.0.0:4222
http_port: 8222

trace: false
debug: true
jetstream: enabled

jetstream {
    store_dir: /Users/lethe/.config/nats/data
    max_mem: 1G
    max_file: 100G
}

accounts: {
    SYS: {
        users: [
            {user: admin, password: public}
        ]
    }
    APP: {
        jetstream: {
            max_memory: 1G
            max_filestore: 10G
            max_streams: 100
            max_consumers: 100
        }
        jetstream: enabled
        users: [
            {user: "fogcloud", password: "xxxx"}
        ]
    }
}

consumer info

Information for Consumer server_dead_letter_1gr2o92rc0400 > server_dead_letter_1gr2o92rc0400-consumer created 2024-02-20T17:52:15+08:00

Configuration:

                    Name: server_dead_letter_1gr2o92rc0400-consumer
               Pull Mode: true
          Deliver Policy: All
              Ack Policy: Explicit
                Ack Wait: 5.00s
           Replay Policy: Instant
         Max Ack Pending: 100
       Max Waiting Pulls: 100
          Max Pull Batch: 1,000

State:

  Last Delivered Message: Consumer sequence: 408 Stream sequence: 1 Last delivery: 2.16s ago
    Acknowledgment Floor: Consumer sequence: 0 Stream sequence: 0
        Outstanding Acks: 1 out of maximum 100
    Redelivered Messages: 1
    Unprocessed Messages: 0
           Waiting Pulls: 1 of maximum 100

stream info

Information for Stream server_dead_letter_1gr2o92rc0400 created 2024-02-20 17:52:15

              Subjects: server.dead_letter.1gr2o92rc0400
              Replicas: 1
               Storage: File

Options:

             Retention: Limits
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 2m0s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: 100,000
   Maximum Per Subject: 1,000,000
         Maximum Bytes: unlimited
           Maximum Age: unlimited
  Maximum Message Size: unlimited
     Maximum Consumers: 16

State:

              Messages: 1
                 Bytes: 412 B
        First Sequence: 1 @ 2024-02-20 17:53:34 UTC
         Last Sequence: 1 @ 2024-02-20 17:53:34 UTC
      Active Consumers: 1
    Number of Subjects: 1
Jarema commented 7 months ago

@AetheWu The consumer reports it just tried to deliver the message and it was not acked. From the state it seems that it happens all the time, as there is only one message on the stream, and it looks like the consumer tried to deliver it over 400 times already (judging by the consumer sequence), but the client never ack it.

Can you please explain what do you think does not work here? Thanks!

dynastymasra commented 6 months ago

Hi @Jarema We also have similiar problem. we published the message to stream successfully but nats consumer didn't receive the message

NATS 2.10.9 We deployed NATS as kube statefulset with PVC

Stream report TARGETS │ File │ │ 3 │ 360 │ 168 KiB │ 0 │ 0 │ │

 Information for Stream TARGETS created 2023-12-25 12:49:01

          Description: Target stream
             Subjects: TARGETS.>
             Replicas: 1
              Storage: File

Options:

            Retention: Interest
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: 1d0h0m0s
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

State:

             Messages: 360
                Bytes: 168 KiB
             FirstSeq: 79,407 @ 2024-03-21T11:44:44 UTC
              LastSeq: 79,716 @ 2024-03-22T19:36:33 UTC
     Active Consumers: 3
   Number of Subjects: 3

Consumer report │ webcx-target-chunk-uploaded │ Push │ Explicit │ 3h0m0s │ 0 │ 0 │ 0 │ 79,686 │ │ │ webcx-target-commit │ Push │ Explicit │ 1h0m0s │ 0 │ 0 │ 0 │ 79,713 │ │ │ webcx-target-finalize │ Push │ Explicit │ 1h0m0s │ 0 │ 0 │ 0 │ 79,713 │ │

Information for Consumer TARGETS > webcx-target-commit created 2023-12-25T12:49:02+09:00

Configuration:

                Name: webcx-target-commit
         Description: Register campaign to user
    Delivery Subject: webcx.target.commit
      Filter Subject: TARGETS.commit
      Deliver Policy: All
 Deliver Queue Group: webcx-targeting-worker
          Ack Policy: Explicit
            Ack Wait: 1h0m0s
       Replay Policy: Instant
  Maximum Deliveries: 5
     Max Ack Pending: 1,000
        Flow Control: false

State:

   Last Delivered Message: Consumer sequence: 10,915 Stream sequence: 79,716 Last delivery: 10h9m14s ago
     Acknowledgment floor: Consumer sequence: 10,915 Stream sequence: 79,713 Last Ack: 10h8m56s ago
         Outstanding Acks: 0 out of maximum 1,000
     Redelivered Messages: 0
     Unprocessed Messages: 0
          Active Interest: Active using Queue Group webcx-targeting-worker

do we have some miss configuration or need another setup for kube staefulset with PVC?

AetheWu commented 4 months ago

We have this problem again. nats-server version: v2.10.14 nats-client version: github.com/nats-io/nats.go/@v1.34.1

nats config

listen: 0.0.0.0:4222
http_port: 8222

trace: false
debug: true
jetstream: enabled

jetstream {
    store_dir: /Users/lethe/.config/nats/data
    max_mem: 1G
    max_file: 100G
}

accounts: {
    SYS: {
        users: [
            {user: admin, password: public}
        ]
    }
    APP: {
        jetstream: {
            max_memory: 1G
            max_filestore: 10G
            max_streams: 100
            max_consumers: 100
        }
        jetstream: enabled
        users: [
            {user: "fogcloud", password: "xxxx"}
        ]
    }
}

stream info

Information for Stream mqtt_status created 2024-05-13 15:25:41

              Subjects: mqtt_status
              Replicas: 1
               Storage: File

Options:

             Retention: WorkQueue
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 2m0s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: 1,000,000
   Maximum Per Subject: 1,000,000
         Maximum Bytes: unlimited
           Maximum Age: unlimited
  Maximum Message Size: unlimited
     Maximum Consumers: 10

State:

              Messages: 250,000
                 Bytes: 98 MiB
        First Sequence: 1 @ 2024-05-13 15:29:06 UTC
         Last Sequence: 250,000 @ 2024-05-13 15:29:08 UTC
      Active Consumers: 1
    Number of Subjects: 1

consumer info

Information for Consumer mqtt_status > mqtt_status-consumer created 2024-05-13T15:25:41+08:00

Configuration:

                    Name: mqtt_status-consumer
               Pull Mode: true
          Deliver Policy: All
              Ack Policy: Explicit
                Ack Wait: 10.00s
           Replay Policy: Instant
         Max Ack Pending: 2,000
       Max Waiting Pulls: 512
          Max Pull Batch: 500

State:

  Last Delivered Message: Consumer sequence: 500 Stream sequence: 500 Last delivery: 11m17s ago
    Acknowledgment Floor: Consumer sequence: 0 Stream sequence: 0
        Outstanding Acks: 500 out of maximum 2,000
    Redelivered Messages: 0
    Unprocessed Messages: 249,500
           Waiting Pulls: 0 of maximum 512
dimuska139 commented 3 months ago

Any news? This problem is really annoying. But there is no solution

derekcollison commented 3 months ago

Might be best to start a new issue and describe it there as this started on a 2.9.22 server. Also be good to make sure issue is reproducible on 2.10.17-RC4 prelease.