Closed sbonebrake closed 6 years ago
If you run with threaded: false
you will get exceptions on the main thread. If you also use automatically_recover: false
, there will be no attempts to recover from I/O exceptions. So with certain timing you can get a low level I/O exception. I am not sure what would be expected here but users who use threaded: false
have explicitly opted in to get exceptions on the main thread.
I thought I misinterpreted the issue and that an I/O exception is thrown when a connection is closed cleanly. That doesn't seem to be the case:
2.5.1 :001 > c = Bunny.new(threaded: false, automatically_recover: false); c.start
=> #<Bunny::Session:0x7fa19d9f3a58 guest@127.0.0.1:5672, vhost=/, addresses=[127.0.0.1:5672]>
2.5.1 :002 > c.close
=> true
Because of that and the fact that the example uses Toxiproxy I suspect that the exception only occurs when RabbitMQ's heartbeat mechanism detects the timeout and closes the TCP connection as it should. In that case it is expected that Bunny might get an I/O exception on a heartbeat (or any other) frame write and it is expected that with the settings used by the OP there will be an exception on the main thread (or rather, the thread that instantiates a Bunny::Session
). That's what the person who contributed threaded: false
support intended to happen.
The “threaded” option doesn’t impact which thread the heartbeat thread raises its exception on. If this example was run with “threaded: true” option, the reader loop will error first and also raise an exception on the main session thread. It seems that when “automatic_recovery” is off, the reader loop or heartbeat will raise exceptions on the session thread when IO errors occur.
It looks like HeartbeatSender#run will never be able to rescue IOError/Exception because transport#write_without_timeout rescues and handles them with automatic_recover or raises them to the session thread.
If automatic recovery is disabled, what else can Bunny do but to throw an exception? It was explicitly told not to recover.
At most HeartbeatSener should mark the session as closed when there is an issue. Then the next attempt to use the Bunny will fail with a ConnectionClosedError. How can it be by design to have unreachable code in HeartbeatSender?
It seems the Java client HeartbeatSender doesn't even close the connection. It just swallows all IOExceptions in HeartBeatSender.java. I feel like that's what it should do in Ruby. HeartbeatSender should make a best effort to keep the connection active at a regular interval and swallow any exceptions that occur along the way. It looks like that's exactly what HeartbeatSender initially did https://github.com/ruby-amqp/bunny/commit/55d57e38a31ae787d8693fe6b6acbe0ccd515653
The purpose of heartbeats is not to keep the connection alive, it is to detect unavailable peers quicker.
If automatic recovery is disabled, throwing exceptions is the best option I could think of and I haven't changed my mind in quite a few years. If recovery is enabled, attempts to enable is the best course of action.
Heartbeat write failures could first close the connection. In case of automatic recovery it would still kick off a recovery attempt. When it is disabled, I don't know if it would be meaningfully different if it was an I/O exception or a ConnectionClosedError
on the next frame write. The user kind of opted into handling any exceptions that may arise.
@acogoluegnes @lukebakken WDYT?
I was referring to this section http://www.rabbitmq.com/heartbeats.html#tcp-proxies where it states that heartbeats can be used to generate network traffic to help with proxy's like HAProxy, but you're right, there are several uses.
I'm unsure how the user "opted in" to handling exceptions here? Because automatically_recover is false? As it is implemented right now, if automatically_recover: false is used then Bunny will interrupt your application with an exceptions any time after the connection is started. AFAIK there is no way to handle the exceptions on the main thread, so the only safe way to do this would be to put Bunny in its own thread.
conn = Bunny.new("amqp://guest:guest@localhost:56722",
heartbeat_timeout: 10,
automatically_recover: false,
threaded: false)
conn.start
# Now any heartbeat IO issues can cause exceptions and interrupt your application
MyApp.do_non_rabbit_important_tasks()
automatically_recover: true isn't always desirable. Lets say I have an application that publishes messages to rabbitmq and there is a backup rabbitmq server it can publish to in case of failure. In this instance it would be ideal to have the publish to the downed server fail as quickly as possible so it can publish to the backup server. If automatically_recover is in enabled, the first publish will "sleep @network_recovery_interval" and then attempt to reopen the connection. It seems reasonable that we should be able to accomplish this with automatically_recover:false without "opting in" to get exceptions in my main application from heartbeat or reader_loop threads.
If this issue is accepted, this would be a simple approach to fixing it IMO https://github.com/ruby-amqp/bunny/compare/master...sbonebrake:safe_heartbeats
This would mean that heartbeats just make a best effort to send, and the starting and stopping would only be controlled from the session. I believe this brings it closer to the Java implementation. The reader_loop could also be modified to check @last_activity_time
similar to how Java's NioLoop checks. This change would ensure the HeartbeatSender never throws an exception on the session thread which would make {automatically_recover: false, threaded: false}
safe options to use. This discussion could be move to the mailing list if that's more appropriate.
What I don't understand is how it would be different from any other exception thrown in the "session thread" (the thread the session was instantiated in?). If you don't want recovery, exceptions will be thrown on the caller thread (IIRC that's the term we used in the docs). Is an I/O exception not indicative enough?
Java client carries I/O exceptions as shutdown reason, so in that sense they are not hidden from the user.
If I'm running threaded: false
the only way to get an an exception on the session thread should be an an explicit call to a session method, which would allow me to set up basic exception handling.
Lets say this a rails app that publishes messages when it gets requests.
class MessageController < ActionController::Base
def bunny
return @bunny if @bunny
@bunny = Bunny.new(host: "something",
automatically_recover: false,
threaded: false,
heartbeat_interval: 30)
@bunny.start
end
def create
begin
bunny.queue("messages").publish(params[:message_content])
rescue Bunny::NetworkFailure => e
logger.log(e)
render :nothing => true, :status => :service_unavailable
@bunny = nil
end
end
def index
# Nothing Bunny related
Messages.last(10)
end
end
If I get a request to MessageController#create
, it will kick off the bunny session. If the heartbeat fails any time after the create method, it will interrupt my rails app, either the running Unicorn process loop or possibly during the MessageController#index
method. Using threaded: true
has the same issue, but I can avoid using that so I'm not as concerned. This is not a theoretical issue, it causes really big problems when we tested bringing rabbits down in our staging environment:
E, [2018-08-07T05:08:04.211313 #15112] ERROR -- : listen loop error: detected a network failure: closed stream (Bunny::NetworkFailure)
E, [2018-08-07T05:08:04.211480 #15112] ERROR -- : /lib/unicorn/http_server.rb:678:in `select'
E, [2018-08-07T05:08:04.211524 #15112] ERROR -- : /lib/unicorn/http_server.rb:678:in `worker_loop'
E, [2018-08-07T05:08:04.211559 #15112] ERROR -- : /lib/unicorn/http_server.rb:508:in `spawn_missing_workers'
E, [2018-08-07T05:08:04.211592 #15112] ERROR -- : /lib/unicorn/http_server.rb:132:in `start'
E, [2018-08-07T05:08:04.211622 #15112] ERROR -- : /bin/unicorn:126:in `<top (required)>'
E, [2018-08-07T05:08:04.211654 #15112] ERROR -- : /bin/unicorn:22:in `load'
E, [2018-08-07T05:08:04.211683 #15112] ERROR -- : /bin/unicorn:22:in `<top (required)>'
Even if I wrapped the whole rails application in a rescue Bunny::NetworkFailure
, the exception could come up in non-bunny MessageController#index
and I would be forced to return a 403 or 500 there as well.
As is, my only is to close the connection after every publish or disable heartbeats. Am I missing some other option combination that makes it safe to run like this?
Have you had any time to think about this further? Maybe @acogoluegnes @lukebakken have thoughts on this?
I have been working on higher-priority items. When I get a couple hours free to really think about this issue, I'll check it out.
@sbonebrake OK, I see what you are unhappy about: potentially unexpected exceptions. I think we can make Bunny heartbeat sender do what its Java client counterpart does. I cannot guess how many people would prefer the current behavior but we already double checked that the connection is marked as closed, so there's a reasonable degree of observability of failure in place.
Thanks @michaelklishin . The suggested diff above is a "least effort" fix and isn't the cleanest, but if you'd like me to put it into a PR with tests I can. Unsure if you had other ideas about how to address it.
I confirm that Java client hearbeat implementation on the sending part is quite passive, as it sends heartbeat frames in the background and indeed ignores write errors. Missed heartbeats from the server are incremented and checked only when a reading operation times out. I think this implementation is consistent with the heartbeat spec. It could be more agressive in the way it detects failures, but failure detection is tricky in practice.
Not sure that heartbeat sending errors should bubble up to the application code though.
@sbonebrake please submit a PR and we'll discuss a specific solution there. I don't know if there can be a neat way to silence some exceptions ;)
This can be closed now. You can also get a similar problem if you run the spec "when the the connection detects missed heartbeats without automatic recovery" with threaded: true
but the exception comes from the reader loop in that case. I'm not sure if that's intended for the same reasons listed above. It can be worked around with threaded false
, so I'm happy.
threaded: true
doesn't make much sense to the DIY recovery crowd. The logic is not very different: if you don't want recovery, Bunny probably shouldn't swallow all exceptions, at least no in the I/O loop.
Reproduced with rabbit 3.7.6 and bunny 2.11.0.
This happens when bunny is created with { automatically_recover: false, threaded: false }. After the connection is open, if at any point it disconnects during a heartbeat attempt it will raise the exception on the thread that initialized bunny.
Code for repro here: https://github.com/sbonebrake/bunny_repro/blob/master/heartbeat_failure_app.rb
I'm not 100% certain what is intended to happen when a heartbeat fails. It seems the heartbeat thread should probably not use transport#write_without_timeout method that raises exceptions to the session thread.