Open sljeff opened 3 years ago
I have had a very similar issue lately. There isn't anything special about the reproducing, there would just be a random deadlock which would prevent all the other Publish
to this channel. It would lock up for at least an hour (didn't check longer time periods, but I am pretty sure that the deadlock would still be there).
However, there was still one weird problem. At about the time when the deadlock appeared, I could see the following log in RabbitMQ Container: [error] <...> closing AMQP connection <...>
, so I am assuming that some connection problems could have caused this mixup in acks and lead to deadlock.
I completely agree with this pull request as I think that semantically Publish has to have another lock since it is just an increment which can be done more efficiently as well as more clearly with atomic operation.
@michaelklishin @streadway
same question here. @michaelklishin @streadway
I encountered a bug similar to this comment: https://github.com/streadway/amqp/issues/254#issuecomment-316401642
However, in a weak network environment, I get a deadlock after N+1 messages have been sent where N is the size of the buffered channel.
How to Reproduce:
Network Link Conditioner
withVery Bad Network
config)Then it will deadlock.
The sequence of events is:
One()
->resequence()
->confirm()
send ack to channelOne() get lock
->resequence()
->confirm()
send ack to channel -> block (channel buffer == 1) (and lockconfirms.m
is not released)Publish
-> Failed to get lockThis time it will keep blocking; simply because the buffer is 1 but two acks are received at the same time (although I sent messages serially, we can indeed reproduce this extreme case).
I don't think
Publish
needs to use the same lock as the other methods, it's just a self-incrementing. Even if the program is inresequence()
, there may not be a problem withPublish()
running at this point.