nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.27k stars 1.37k forks source link

JetStream consumers got stuck only for one stream #4168

Closed avalchev94 closed 1 year ago

avalchev94 commented 1 year ago

Defect

Hi all! I am using JetStream with multiple consumers and streams, however recently all consumers for one specific stream got stuck(all other consumers for different streams were working). They had stopped(last received was 20d ago) receiving any new messages, even though new messages were successfully published. While debugging the issue, I tried to list some existing messages using the cli tool (nats stream view/get), however even they were failing(most of the time with context deadline exceeded).

At the end, I had to purge the stream(nats stream purge) and everything started working as expected. Fortunately, I made a backup(will attach) before calling purge. I restored the backup to a clean nats instance, however I am still not able to read any messages from that stream, neither with nats cli nor by writing code.

Backup: profile_nats_backup.zip

Versions of nats-server and affected client libraries used:

nats-server v2.9.6 nats.go v1.20.0

OS/Container environment:

The OS where issue was initially reproduced is photon:4.0-20210507 The OS where i was testing the restore is MacOS v12.5.1

Steps or code to reproduce the issue:

  1. Restore the backup on a clean NATS instance usign nats cli: nats stream restore.
  2. Try to read any old messages with nats-cli or writing code.
Jarema commented 1 year ago

Hi!

Thanks for detailed report and backup!

Will take a look into it.

derekcollison commented 1 year ago

Might be good to consider updating your server version to a more recent patch version, 2.9.16 is the latest and 2.9.17 will be released this week.

avalchev94 commented 1 year ago

Hey guys, is there some update here? Have you been able to find what's the problem with that data?

derekcollison commented 1 year ago

Does this issue present under latest server release 2.9.17? If it does we will of course take a look, but the 2.9.6 server has not been supported for awhile.

EnneS commented 1 year ago

Hello, I am having the same issue under the 2.9.14 release. However I am not able to purge the corresponding stream (as I am getting the context deadline exceeded error again) and therefore confirm that purging it unstucks it. Other streams and newly created streams do not encounter this issue.

EnneS commented 1 year ago

Upgraded to 2.9.17 and no longer getting the issue.