nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16k stars 1.41k forks source link

Redelivery of already acknowledged messages during node reboot #6036

Open chazzbg opened 4 weeks ago

chazzbg commented 4 weeks ago

Observed behavior

Under high load, in a 3 node cluster, messages from work queue stream are redelivered after acknowledgment when one of the nodes is rebooted.

Expected behavior

Messages should be delivered exactly once

Server and client version

server version: 2.10.19 and 2.10.22 cli version: 0.1.4

Host environment

Azure Kubernetes cluster, JetStream enabled, 3 nodes in stateful sets,

Steps to reproduce

No response

neilalexander commented 4 weeks ago

Can you please provide more information about exactly which node restarts? Is it the stream leader or one of the followers?

Is the consumer using Ack() or AckSync()? (In other words, are you sure the ack was processed?)

chazzbg commented 4 weeks ago

I haven't noted the actual node we are restarting. We are using the PHP library ( https://github.com/basis-company/nats.php ) which in it's current implementation does not seems to wait for a response on ack, so i would guess is not sync ack.

neilalexander commented 4 weeks ago

Unfortunately it is difficult for us to provide support for third-party libraries. Can you please try a reproduction case in one of the tier-1 clients using AckSync()?

If you need stronger ack processing guarantees then you need to wait for the ack-ack, some more examples are here: https://natsbyexample.com/examples/jetstream/ack-ack/go