Closed mperham closed 9 years ago
I can't figure out a way to handle the expected case of the non-local block return above, while also handling the unexpected non-local, uncatchable Timeout::Error.
:question: :confounded:
I'm not sure if Thread#handle_interrupt can help us here. http://ruby-doc.org/core-2.2.1/Thread.html#method-c-handle_interrupt
@tamird Any brilliant ideas?
@mperham yeah, Thread#handle_interrupt
can help us by replacing the existing solution with uninterruptible ConnectionPool#with
. I'm not really sure that's a sane thing to do, but the implementation should be a straight port of https://github.com/brianmario/mysql2/pull/592
Does that patch makes mysql2 impervious to Timeout.timeout, so the Timeout functionality simply doesn't work anymore if the mysql connection hangs?
That's my understanding. :see_no_evil:
At this point I'm ready to rollback this entire set of changes and call this a documentation problem: don't use Timeout.timeout if you want your code to work reliably.
Seems reasonable to me. Heads up for the homies @ggilder @zanker
Ok, please review the code and let me know. I use handle_interrupt to avoid leaking connections in the case where a timeout occurs right as the connection is being checked out.
In thinking about this more, it seems like it is up to each connection to ensure its own safety in the face of Timeout, just as mysql2 did. Perhaps a fix to the redis gem is in order.
Ruby :(
Yea there's not much choice. Only way I know of to get around that is to wrap the calls in an ensure in another method, but you lose the results of return
which makes it useless.
I'll leave @tamird to comment on the specifics, but JRuby 1.7.x does not support Thread.handle_interrupt
. Whatever you do will have to check for the existence of the method.
@tamird How's that test look?
@mperham doesn't look like it tests that checkout
(or checkin
, for that matter) is protected from timeouts
Better?
Better, yeah. What's the point of the pool size assertion in that test? the "real" checkout happens before the sleep, so the assertion isn't validating anything not validated in the previous test
Also 1.9.3 and JRuby failed in CI as expected, that test probably needs to be marked pending if Thread doesn't respond_to handle_interrupt
Final call, this is ready to ship IMO. Any other comments?
Is there anything dangerous about blocking all exceptions instead of just Timeout::Error?
Edit: And Timeout::ExitException apparently.
@ryansch I wish I knew. The semantics of handle_interrupt are not well documented at all. What happens if a Timeout occurs during a :never block? Is it silently discarded or delayed and raised once allowed? (Answer: it looks like the latter)
Should I just handle Timeout::ExitException?
I'm unclear on when ExitException is raised as opposed to Error. The docs (http://ruby-doc.org/core-2.2.1/Thread.html#method-c-handle_interrupt-label-Guarding+from+Timeout-3A-3AError) go a different route and I don't understand ruby internals well enough to grok the :on_blocking
option.
The timeout.rb implementation is confusing at best. It appears that ExitException is meant to be a wrapper around the exception raising that ensures it's on the right thread. It uses throw/catch to do the dirty work. @tamird: Did you find that handle_interrupt
caused that process to break down? I don't have enough time to attach a debugger and completely understand this code.
Mike: I was concerned about other exceptions but a more careful read of the docs suggest that handle_interrupt
only applies to async errors from Thread.kill
or Thread.raise
. Would this be a problem when sidekiq wants to shutdown?
Break down? No. In the end, Thread#raise does the dirty work and
this, when I last looked a few hours ago.
I don't know what sidekiq does at shutdown that is relevant to any of this. On Apr 10, 2015 6:18 PM, "Ryan Schlesinger" notifications@github.com wrote:
The timeout.rb implementation is confusing at best. It appears that ExitException is meant to be a wrapper around the exception raising that ensures it's on the right thread. It uses throw/catch to do the dirty work. @tamird https://github.com/tamird: Did you find that handle_interrupt caused that process to break down? I don't have enough time to attach a debugger and completely understand this code.
Mike: I was concerned about other exceptions but a more careful read of the docs suggest that handle_interrupt' only applies to async errors from Thread.killorThread.raise`. Would this be a problem when sidekiq wants to shutdown?
— Reply to this email directly or view it on GitHub https://github.com/mperham/connection_pool/pull/75#issuecomment-91707053 .
@ryansch For the most part, all bets are off when a Sidekiq Thread is shutting down "hard" via raise/kill. The process is going away so it doesn't really matter.
To be clear @tamird, I was trying to figure out why we couldn't just ignore Timeout::Error
instead of seeing Timeout::ExitException
.
I'd be interested in using the following:
Thread.handle_interrupt(Timeout::Error: :never, Timeout::ExitException: :never) do
...
Thread.handle_interrupt(Timeout::Error: :immediate, Timeout::ExitException: :immediate) do
...
end
end
:on_blocking
is interesting as it sounds like it won't raise in client code unless the thread is asleep on IO (ex: select
) but I don't believe that's the normal execution model for timeout.rb and might be surprising in client code.
Ok I think I understand why we can just ignore Timeout::ExitException
. It prevents the other thread's raise
from working: https://github.com/ruby/ruby/blob/v2_2_1/lib/timeout.rb#L86
The end result is that the timeout
method always returns on line 89 (as it's a Proc) and never has a chance to raise Timeout::Error
.
LGTM (might want to squash). I'd probably do something about that failing rubinius test, annoying to always have a failure.
Thanks. I never squash, rebase or rewrite history. Saves me from ever having git issues.
Unsure what to do about Rubinius. I don't see how that failure's not a bug on their part.
For what it's worth, this is what I was trying to debug in https://github.com/mperham/connection_pool/pull/67#discussion_r27772882
The changes in #67 have resulted in unnecessary connection discards when using an explicit return:
The explicit return means that the
success = true
line withinwith
is not executed.