ruby-amqp / hutch

A system for processing messages from RabbitMQ.
https://gocardless.com/blog/hutch-inter-service-communication-with-rabbitmq/
MIT License
855 stars 137 forks source link

Delivery Acknowledgement Timeout makes hutch process stuck #364

Open godsent opened 3 years ago

godsent commented 3 years ago

There is a timeout (by default 15 minute) for consumer to process and ack / nack a message. https://www.rabbitmq.com/consumers.html#acknowledgement-timeout After this timeout if consumer still haven't done with the processing the consumer's channel would be closed, all the prefetched messages would be returned to the queue. However hutch consumer continues to work and if at this point the consumer finishes the message processing hutch will try to ack the message here: https://github.com/ruby-amqp/hutch/blob/master/lib/hutch/worker.rb#L74 It will cause the Bunny::ChannelAlreadyClosed exception, because you can ack only via opened channel, however the error will be caught at the next line: https://github.com/ruby-amqp/hutch/blob/master/lib/hutch/worker.rb#L75 and hutch will try to execute acknowledge_error(delivery_info, properties, @broker, ex) line. It will cause Bunny::ChannelAlreadyClosed again, because by default this line try nack the message, but again you can nack only on opened channel. This new exception will be never reported by following line: https://github.com/ruby-amqp/hutch/blob/master/lib/hutch/worker.rb#L77 and will be caught only in Bunny code here: https://github.com/ruby-amqp/bunny/blob/c99ffcb0876a46d71c822b8b96a6ac978fd0af14/lib/bunny/channel.rb#L1769 printing a message to STDOUT, that means ignoring the error.

The main problem is the fact that hutch process will stay alive and will not receive new messages until the process restart.

Hammam94 commented 1 year ago

is new about this issue or how to solve because I faced it when I use quorum queue when I do requeue!

michaelklishin commented 1 year ago

Delivery acknowledgement timeout can be effectively disabled by bumping it to something like an hour or a few hours. It's not super safe but would work equally well for any tool.

Those who would like to see this addressed can see the detailed description of the exception flow above and contribute an improvement. This is open source software after all.