igrigorik / em-synchrony

Fiber aware EventMachine clients and convenience classes
http://www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers
MIT License
1.04k stars 151 forks source link

AMQP auto_recovery callback causes "can't yield from root fiber" #141

Closed jjrussell closed 9 years ago

jjrussell commented 12 years ago

Using the synchrony AMQP connection with a channel auto_recovery set to true. I can bring up the connection and receive message successfully. When I bring down the message bus the connection.on_connection_interrupted event handler is called correctly.

However, when I bring the broker back up and the queue attempts to auto_recover it calls queue.rebind which is monkey patched in em-synchrony/amqp.rb to call the superclass implementation inside EM::Synchrony::AMQP.sync.

Since the callback happens as a result of an EM recieve_data call which is not run on a Fiber, the rebind call happens on the root fiber, AMQP::sync attempts to yield its fiber and you get the "can't yield from root fiber" error. Below is the stack trace of the error that happens after I bring the broker back up and auto_recovery happens.

My question is where is it possible to wrap this call in a Fiber? None of my code is actually called from the EM.recieve_data method so I'm not sure what to wrap.

Thanks a lot.

can't yield from root fiber
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/em-synchrony-1.0.2/lib/em-synchrony/amqp.rb:17:in `yield'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/em-synchrony-1.0.2/lib/em-synchrony/amqp.rb:17:in `sync'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/em-synchrony-1.0.2/lib/em-synchrony/amqp.rb:158:in `rebind'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amqp-0.9.7/lib/amqp/queue.rb:325:in `block in auto_recover'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/callbacks.rb:63:in `call'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/callbacks.rb:63:in `block in exec_callback_once_yielding_self'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/callbacks.rb:63:in `each'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/callbacks.rb:63:in `exec_callback_once_yielding_self'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/queue.rb:464:in `handle_declare_ok'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/queue.rb:503:in `block in <class:Queue>'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/adapter.rb:551:in `call'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/adapter.rb:551:in `receive_frameset'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/adapter.rb:525:in `receive_frame'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/amq-client-0.9.4/lib/amq/client/async/adapters/event_machine.rb:325:in `receive_data'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/eventmachine-1.0.0/lib/eventmachine.rb:187:in `run_machine'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/eventmachine-1.0.0/lib/eventmachine.rb:187:in `run'
/Users/jjrussell/.rvm/gems/ruby-1.9.3-p194/gems/em-synchrony-1.0.2/lib/em-synchrony.rb:28:in `synchrony'
igrigorik commented 12 years ago

Hmm, thanks for reporting this.

/cc @calj - any thoughts?

jjrussell commented 12 years ago

Working around this by setting auto_recovery on the channel to false, recovering the connection and just creating new channel, queues and consumers in the Connection.on_recovery callback.

ghost commented 11 years ago

Running into this as well.

tsenart commented 11 years ago

Me too :/

igrigorik commented 11 years ago

So, reading through the code...

https://github.com/ruby-amqp/amqp/blob/d1978ceba61e5be3b83629db1a3e67fb0a04788f/lib/amqp/channel.rb#L313 https://github.com/ruby-amqp/amqp/blob/d1978ceba61e5be3b83629db1a3e67fb0a04788f/lib/amqp/queue.rb#L374

And corresponding patch in synchrony: https://github.com/igrigorik/em-synchrony/blob/master/lib/em-synchrony/amqp.rb#L165

Do we need to patch rebind at all?

tsenart commented 11 years ago

The problem is that in the context of auto recovery, these methods get called in the root Fiber, hence the error. I tried it without rebind and the without rebind nor bind and it's still the same. I am gonna try to wrap my head around this call stack but I would appreciate some help :)

jjrussell commented 11 years ago

@igrigorik when you say "Do we need to patch rebind at all" do you mean we can take it out of the list of methods that the em-synchrony amqp code overrides and just let the base amqp code handle it? If so that seems reasonable. I'd be ok if recovery happened without any fibers but I'm not sure what the side effects of that would be. Is rebind used elsewhere where it would need to be wrapped for synchrony?

calj commented 11 years ago

The rebind method is not really asynchronous (the callback is ignored): https://github.com/ruby-amqp/amqp/blob/d1978ceba61e5be3b83629db1a3e67fb0a04788f/lib/amqp/queue.rb#L366

You should try to just remove "rebind" from the list here: https://github.com/igrigorik/em-synchrony/blob/master/lib/em-synchrony/amqp.rb#L165

igrigorik commented 11 years ago

Yes, what @calj suggested.. Just try removing rebind from the list of patched methods, and see if that "solves it".

tsenart commented 11 years ago

I am sorry if I wasn't clear in my last comment but I did try already to remove rebind and it didn't help. The same error is thrown by bind which I then also removed but to no avail either.

ulfurinn commented 11 years ago

A working option is to keep the sync wrapper but force a secondary fiber:

module EventMachine::Synchrony::AMQP
  class Queue
    def rebind
      Fiber.new do
        arebind &EM::Synchrony::AMQP.sync_cb(Fiber.current)
      end.resume
    end
  end
end