Closed dumbbell closed 8 years ago
The problem was discovered with make standalone-tests FILTER=eager_sync:eager_sync
: the testcase hangs forever because it waits for confirms for messages in the range 1..2000 (its seqno was reset after the second confirm.select
), whereas the broker sends ACKs for messages in the range 4001..6000.
What is weird is that Travis CI reproduces the hang 100% of the time. I can reproduce it 90% of the time. Jenkins never hits it. There must be a race in the client or the testcase which could hide the problem.
The Confirms extension specification states:
That's what RabbitMQ does in
rabbit_channel.erl
:The sequence number is set to 1 during channel init:
When a
confirm.select
is received, a single flag is toggled and the sequence number is left untouched:Unfortunately, rabbitmq-erlang-client resets the seqno to 1 with every
confirm.select
:The obvious consequence is that a second
confirm.select
from a client on a channel will lead to an inconsistency between the broker and the client counters. The client will receive ACKs for messages it never published yet.The code should only set the seqno to 1 if it's set to 0 (0 meaning that confirms are disabled).