jcabotc / hare

Some AMQP abstractions
8 stars 5 forks source link

When doing graceful shutdown, we need to trap exit to call terminate. #8

Closed archseer closed 7 years ago

archseer commented 7 years ago

On :init.stop, the code would crash, since a gen_server skips terminate in those cases, unless you trap exits (in which case it gets called). Terminate then properly closes the channel.

Fixes #7

jcabotc commented 7 years ago

I don't think trapping exits is a good idea.

I have seen this error in the past playing with the :amqp library, but don't remember how I solved it. Unfortunately at the moment I am very busy and I cannot spend time on this library.

Maybe it is a problem with the actor implementation, the adapter, or the :amqp library. Trapping exits instead of solving the problem will hide it under a catch-all solution.

archseer commented 7 years ago

@jcabotc the problem is definitely with terminate not getting called on shutdown, and the only way to currently get these is by trapping: http://blog.differentpla.net/blog/2014/11/13/erlang-terminate

jcabotc commented 7 years ago

@archSeer We should not need to trap exits, the other two solutions you are suggesting at #7 remove the need of trapping exits and solve the root problem by not allowing any process to crash.

archseer commented 7 years ago

The process doesn't crash. It's just the shutdown signal propagating through the process tree, from what I understand it's stopped in reverse. First the consumers are halted, then the Hare.Conn. As I've previously stated and referenced, when you trap exits, gen_server has a built in implementation to convert that and call terminate/2 for you ("But terminate didn’t get called, because we’re not trapping exits."), allowing you to close the channel and gracefully shut down.

Without trapping, terminate/2 never gets called, which means this is never called, and then the channel is never properly closed.

Here's a similar fix in an erlang project, that does a much better explanation: http://git.mmlx.us/?p=erlbot.git;a=commitdiff;h=bbe3789804b100542b1df0b248bcf16b111e71d5

This fixes a bug wherein application:stop(irc) would leave a dangling Channel
that was consuming messages on a queue -- long after the process to which it
would deliver is dead.  That would cause the AMQP channel to exit with reason
unexpected_delivery_and_no_default_consumer, killing all of the rest of the
AMQP client and, by the transitive property of death, the irc application as
well.

The irc_amqp_listener must trap exits in order to receive a termination notice
from its supervisor.  I'm not actually sure if this has any implications for it
monitoring its AMQP Channel or not.

This is the correct and currently the only solution for gracefully shutting down gen_servers.

archseer commented 7 years ago

@jcabotc Any thoughts on ^^^

We're currently using a custom fork to work around this

take-five commented 7 years ago

I believe this line is the culprit. When the consumer process exits, amqp tries to cancel consumption gracefully and sends cancel to RabbitMQ. RabbitMQ then sends back cancel_ok but at that moment there is already no process that could've processed that message. I'm really not sure how to solve this without modifying amqp or trapping exits.