Closed arohter closed 10 years ago
We make heavy use of process-forking environments (unicorn and resque), along with multiple parallel redis server connections (sharded clusters). In production usage we were running into kernel mutex deadlocks (we're on ruby 1.9). Research (1) indicates that there's some subtle pthread mutex handling bugs in ruby, especially during process termination. This primarily was caused by reconnect() calls, which trigger zookeeper thread mutex deadlocking. ZK and redis libs can detect InheritedErrors during forking, so there's really no need to preemptively do this via reconnect() anymore.
As a nice side effect of this new 'lazy' connection handling, we reduce zookeeper and redis server load, since we're cycling client connections and lookups far far less.
We've also done rather extensive zookeeper cluster failure load testing, leading to robustified zookeeper connectivity failure handling. ZK failures are still pretty much stop-the-world events for redis command processing, but using cached redis cluster state is on the todo list.
(1)
https://github.com/ryanlecompte/redis_failover/issues/38
https://github.com/zk-ruby/zk/issues/50
https://github.com/resque/resque/issues/578
https://bugs.ruby-lang.org/issues/6438
vendor/bundle/ruby/1.9.1/gems/zookeeper-1.4.7/ext/c_zookeeper.rb:237:in 'join': deadlock detected (fatal)
This merge will resolve both https://github.com/ryanlecompte/redis_failover/pull/56 and https://github.com/ryanlecompte/redis_failover/issues/47 / https://github.com/ryanlecompte/redis_failover/pull/48
Also, the previous PR https://github.com/ryanlecompte/redis_failover/pull/48 should resolve https://github.com/ryanlecompte/redis_failover/pull/44
Also, the previous PR https://github.com/ryanlecompte/redis_failover/pull/57 should resolve https://github.com/ryanlecompte/redis_failover/issues/42
Will look into test failures....
Thanks for putting this together! These fixes look good. Just a minor code about leaving in commented-out code. Also, any thoughts on the build failures?
@arohter - I just made you a committer for the project! Thanks for helping maintain redis_failover. Feel free to merge this once you think it's ready.
Fixed and added some tests. Have some in-progress tests for zk error handling, but will do that in separate PR.
The lingering jruby test failure should disappear in the read-only slave healthchecking fix PR coming next.
We're running the latest released versions of zk (1.9.2) and zookeeper (1.4.7).
You made a lot of fixes to redis_failover, or to zk/zookeeper code in the past few months? I'd be very interested in seeing these :)
Interestingly enough, we found that using the "recommended" ZK.install_fork_hook actually made things far worse - we were getting a ton of deadlocks - which is why we stopped using it (but I added the NOTE text for future ref).
After some research, as you mentioned, the issues stem from the underlying zookeeper threading model, combined with ruby's (at least 1.9.3, have yet to try 2.0 or 2.1) buggy kernel mutex handling. In our case, due to our heavy use of resque's process-forking jobs, the preemptive reconnect() calls (which trigger zk.reopen) combined with process termination triggers the bug. In the end, removing the preemptive reopen, combined with lazy retry handling significantly reduced mutex contention, virtually eliminating deadlocks in our production env.
The changes were made to zk-ruby/zk and zk-ruby/zookeeper.
Do you have a ruby stack trace of where the processes were hung?
I just noticed it looks like there are a couple fixes that haven't made it into official releases yet, so I'll work with @slyphon to make that happen.
Some traces [pthread_mutex_destroy: Device or resource busy EBUSY]: https://gist.github.com/arohter/5daa1262ba0921b7a831 https://gist.github.com/arohter/eae44f2a9513f46c4202 https://gist.github.com/arohter/c4622eaf177172925bea
Here's a bit with fork hook enabled: https://gist.github.com/arohter/a32b38649d6d2a249faa https://gist.github.com/arohter/716a26274b71d032c391
strace of the hung processes show them spinning on sched_yield (sometimes using 100% cpu). Like https://github.com/zk-ruby/zk/issues/50
Are you saying those first ones with the pthread_mutex_destroy
error were happening with the fork hook enabled or disabled? Those would definitely be the sorts of problems that could happen without the hook enabled.
Without the stack trace of all threads in the ones with fork hooks enabled, it's hard to know exactly what else was going on to cause the deadlocks.
I am working with @slyphon to create releases for the latest changes to zk and zookeeper. I'd be interested to see if they fix the issues you see with using the fork hook.
Okay. zookeeper 1.4.8 and zk 1.9.3 have been released.
Let me know if those fix your problem when you have the fork hooks enabled.
The pthread_mutex_destroy are without fork hook. I dont think I have any gist logs of the traces with the fork hook enabled.
I'm definitely concerned with master not having the fork hooks enabled. They're there to prevent those pthread errors you saw.
I would suggest this pull request be reverted in its entirety and revisited.
There's a bunch of stuff here that is troubling.
Wow. Go offline for a few hours and things turn crazy... ;-)
Let's be perfectly clear here folks: ZK.install_fork_hook was and has never been enabled in redis_failover This PR changed nothing in this regard. It simply added a (commented out) reference to the existence of this hook, since I've spent a great deal of time playing (and deadlocking) with it.
Walking through the logic and production experience backing this PR:
Ruby apparently has a bug related to kernel mutex handling, particularly during process shutdown.
The underlying zookeeper c library unfortunately exposes this bug, due to its heavy use of thread mutexes.
Combine this with use of forking applications like Resque (lots of processes starting and stopping), and the exposure to this mutex handling surface becomes much much greater.
This is exacerbated by the fact that Resque preemptively calls reconnect() after fork, and thus zk.reopen(), since redis-rb can't share connections between different processes (it'll throw Inherited Error when you try).
Mutex holding during process shutdown makes things go :boom:
redis_failover already detected zookeeper InheritedConnectionErrors, automatically (and correctly) calling zk.reopen as needed. This PR simply adds auto detection and handling of the equivalent redis InheritedErrors, removing the need for explicit and preemptive reconnect calls by forking applications. To be clear, this doesn't fix (or change) the underlying mutex handling bug; it merely reduces the exposure to running into it.
Reverting this PR is unnecessary, and without these code changes redis_failover was unusable for us in the real world. In our heavy production workload, moving to this "only as needed" reconnect/reopen handling virtually eliminated the large amount of pthread_mutex_destroy errors we were seeing. It also noticeably reduced the connection load on our zookeeper and redis clusters.
Now, on to the root topic of the objections: ZK.install_fork_hook
We were initially very excited about discovering the existence of ZK.install_fork_hook, as it seemed to offer a solution to the very problem we were hitting. Unfortunately, after enabling it, we started running into deadlocks. [The good news is it appeared to reduce/eliminate the pthread_mutex_destroy errors!] Deadlocks and 100% cpu spins are far worse than ungraceful process exits. ZK.install_fork_hook + zk 1.9.2 + zookeeper 1.4.7 was a completely unusable combo for us.
Good news is apparently we're not alone in running into ZK.install_fork_hook problems. And @eric has kindly worked on fixing the underlying issue! I'm looking forward to trying it again with zookeeper 1.4.8 and zk 1.9.3. I'd consider that a separate PR however, as this PR does not change behavior wrt use (or not) of ZK.install_fork_hook.
@arohter, could you please do the following:
This will give us more context and help us re-review the changes that were made. Sorry for the confusion.
All the changes in this PR are still valid, as they have nothing to do with the use of ZK.install_fork_hook. I've removed my comments about the hook however, since it was clearly confusing folks. I've re-issued the PR here https://github.com/ryanlecompte/redis_failover/pull/71/ , although it's a bit unhappy atm due to your forced revert.
I will test and issue a new PR for the use of ZK.install_fork_hook
with latest zk/zookeeper deps, as this would technically be a new feature to redis_failover.
For the record, the use of the zk fork hook ( https://github.com/fanhattan/redis_failover/compare/conn_handling_v2_nofork...conn_handling_v2_forkhook ) still causes almost immediate deadlocks in our env, despite running latest versions of zk/zookeeper.
That is definitely a strange one. I wonder what is causing this to happen when it doesn't happen in the tests.
Would you be able to set the environment variable ZKRB_DEBUG=true
and run
it again to get some debug output from the logger?
It appears that something about the shutdown process isn't working as it should.
The reason why it's so important to be running the fork hooks is because without them, you're going to get random "flakiness" that could include: randomly hanging, headlocks, and garbled communication.
We've put a lot of effort into the zookeeper forking behavior to accomodate the specifics of how the C zookeeper library works and how the ruby runtime works, so it would be good to get to the bottom of this so it can be fixed for everyone.
On Mon, Mar 17, 2014 at 1:00 AM, arohter notifications@github.com wrote:
For the record, the use of the zk fork hook ( fanhattan@ conn_handling_v2_nofork...conn_handling_v2_forkhookhttps://github.com/fanhattan/redis_failover/compare/conn_handling_v2_nofork...conn_handling_v2_forkhook) still causes almost immediate deadlocks in our env, despite running latest versions of zk/zookeeper.
https://gist.github.com/arohter/a2c8f4171fcd19d7e738
Reply to this email directly or view it on GitHubhttps://github.com/ryanlecompte/redis_failover/pull/66#issuecomment-37791918 .
+1 for trying to debug the ZK shutdown issue!
On Mar 17, 2014, at 1:14 AM, Eric Lindvall notifications@github.com wrote:
That is definitely a strange one. I wonder what is causing this to happen when it doesn't happen in the tests.
Would you be able to set the environment variable
ZKRB_DEBUG=true
and run it again to get some debug output from the logger?It appears that something about the shutdown process isn't working as it should.
The reason why it's so important to be running the fork hooks is because without them, you're going to get random "flakiness" that could include: randomly hanging, headlocks, and garbled communication.
We've put a lot of effort into the zookeeper forking behavior to accomodate the specifics of how the C zookeeper library works and how the ruby runtime works, so it would be good to get to the bottom of this so it can be fixed for everyone.
On Mon, Mar 17, 2014 at 1:00 AM, arohter notifications@github.com wrote:
For the record, the use of the zk fork hook ( fanhattan@ conn_handling_v2_nofork...conn_handling_v2_forkhookhttps://github.com/fanhattan/redis_failover/compare/conn_handling_v2_nofork...conn_handling_v2_forkhook) still causes almost immediate deadlocks in our env, despite running latest versions of zk/zookeeper.
https://gist.github.com/arohter/a2c8f4171fcd19d7e738
Reply to this email directly or view it on GitHubhttps://github.com/ryanlecompte/redis_failover/pull/66#issuecomment-37791918 .
— Reply to this email directly or view it on GitHub.
Yes, I will try and grab some debug logs of this in action.
More robust zk client error handling: support progressively more severe retry logic. Automatic fork and InheritedError handling: Lazy reconnects reduce zookeeper and redis server load. Reduces/prevents kernel mutex deadlocks in forking env. Trace connection debug logging.