nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16k stars 1.41k forks source link

backoff non respected for some messages retries occuring during the long backoff of another message #6085

Closed JulienVdG closed 1 week ago

JulienVdG commented 2 weeks ago

Observed behavior

The re-deliver of a message is occuring only after the redeliver of another message even if it's backoff time is shorter.

This seam to occur if the message is first delivered during the long backoff of a previous message.

Expected behavior

The backoff time are always respected

Server and client version

nats-server version 2.10.18 (a collegue had the issue on latest stable, I didn't double check yet) nats cli version 0.1.4

Host environment

linux

Steps to reproduce

nats stream add Test --replicas=1 --retention=work --subjects="t.>" --storage=file --ack --max-age=7d --defaults
nats consumer add Test mycons --ack=explicit --wait=1m --max-deliver=4 --backoff=linear --backoff-steps=3 --backoff-min=10s --backoff-max=40s --filter='t.t.>' --pull --defaults
while true; do nats con next --no-ack Test mycons --timeout=40s; done

in another term:

nats pub t.t.0 A; sleep 30; nats pub t.t.0 C

Up to sleep 49, (ie 1 second before the end of the first message backoff) we can see the effect, here is the output of the first terminal:

[15:20:57] subj: t.t.0 / tries: 1 / cons seq: 57 / str seq: 15 / pending: 0

A
[15:21:07] subj: t.t.0 / tries: 2 / cons seq: 58 / str seq: 15 / pending: 0

A
[15:21:27] subj: t.t.0 / tries: 3 / cons seq: 59 / str seq: 15 / pending: 0

A
[15:21:27] subj: t.t.0 / tries: 1 / cons seq: 60 / str seq: 16 / pending: 0

C
[15:21:57] subj: t.t.0 / tries: 4 / cons seq: 61 / str seq: 15 / pending: 0

A
[15:21:57] subj: t.t.0 / tries: 2 / cons seq: 62 / str seq: 16 / pending: 0

C
[15:22:17] subj: t.t.0 / tries: 3 / cons seq: 63 / str seq: 16 / pending: 0

C
[15:22:47] subj: t.t.0 / tries: 4 / cons seq: 64 / str seq: 16 / pending: 0

C

here we can see that the 2nd try for C was at 15:21:57 while it was expected at 15:21:37.

Jarema commented 1 week ago

Thanks for the report. We are looking into this.

MauriceVanVeen commented 1 week ago

Can at least confirm the issue is also on latest main.