seomoz / qless

Queue / Pipeline Management
MIT License
294 stars 76 forks source link

Forking worker unable to gracefully exit #202

Open tpickett66 opened 10 years ago

tpickett66 commented 10 years ago

I'm getting the following backtrace from a Qless::Worker::ForkingWorker when sending SIGQUIT when using Ruby 2.1.

<gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:159:in 'synchronize': can't be called from trap context (ThreadError)
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:159:in 'shutdown_sandboxes'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:131:in 'stop!'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:72:in 'block in register_signal_handlers'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'call'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'wait2'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'block in run'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:88:in 'loop'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:88:in 'run'

In versions of Ruby prior to 2.0 the call that caused this was allowed but is considered unsafe due to a potential deadlock (see: Can't write to a Logger in a signal handler). A common way to handle this is to have a signal queue that the run loop reads from. This will require not blocking when checking the status of children so the run loop can perform both the signal queue flush as well as child process house keeping, there are several good examples of this in ruby (Unicorn and Foreman come to mind). I intend to get started on reworking the forking worker's signal handlers, and will likely end up fixing #161 as well.

As part of this I'll be using IO.pipe and a couple of other pieces of functionality that have changed in the last couple versions of Ruby, what versions should I target with these changes? Also, are there any pitfalls of the code I should know about before embarking on this journey?

esfourteen-zz commented 9 years ago

We're experiencing the same issue when running qless through upstart. Stopping the parent process leaves orphaned workers lingering.

evanbattaglia commented 9 years ago

Hey @tpickett66 any update on this? I'm running into the same issue.