livingsocial / rearview

Timeseries data monitoring framework
Other
281 stars 31 forks source link

Rearview::MonitorService crashing on startup #43

Closed tarcieri closed 10 years ago

tarcieri commented 10 years ago

Hello there! I seem to be facing a problem of my own design as Rearview::MonitorService is crashing on startup with Celluloid::DeadTaskError as the only clue:

2014-05-16_03:52:52.47370 [JRubyWorker-1] INFO / - #<Rearview::MonitorService:0x2d858dff> starting up service...
2014-05-16_03:52:52.47371
2014-05-16_03:52:52.48289 [RubyThread-13: /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/internal_pool.rb:56] INFO / - Rearview::MonitorService crashed!
2014-05-16_03:52:52.48291 Celluloid::DeadTaskError: cannot resume a dead task (dead fiber called)
2014-05-16_03:52:52.48292       /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/tasks/task_fiber.rb:25:in `deliver'
2014-05-16_03:52:52.48293       /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/tasks.rb:69:in `resume'
2014-05-16_03:52:52.48294       /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/responses.rb:11:in `dispatch'
2014-05-16_03:52:52.48294       /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/actor.rb:331:in `handle_message'
2014-05-16_03:52:52.48295       /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/actor.rb:174:in `run'
2014-05-16_03:52:52.48297       /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/actor.rb:157:in `initialize'
2014-05-16_03:52:52.48299       /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/thread_handle.rb:13:in `initialize'
2014-05-16_03:52:52.48300       org/jruby/RubyProc.java:249:in `call'
2014-05-16_03:52:52.48300       /data/app/rearview/installs/rearview_bba883c9d4335407b321a9420f047a008aa1d173/vendor/bundle/jruby/1.9/gems/celluloid-0.14.1/lib/celluloid/internal_pool.rb:59:in `create'
2014-05-16_03:52:52.48301
2014-05-16_03:52:52.56617 [main] INFO / - An exception happened during JRuby-Rack startup

I'm pretty surprised Celluloid isn't logging some other error here. Also if I manually try to run the same code from the Rails console, it works, which is rather perplexing.

Anyway, as Celluloid's author I really hate seeing hard-to-debug scenarios like this and sure would love to make it easier to figure out what's wrong, particularly since this problem is a roadblock for me personally! :smile:

danmayer commented 10 years ago

Hmmm interesting. I feel like I recently saw one of those in our system as well. I will see if I can get someone from the rearview team to take a look. Thanks for reporting and good to hear from you @tarcieri ;)

talbright commented 10 years ago

Hey there @tarcieri! I didn't see this before, I guess my notification settings are wrong. Are you still having this problem?

tarcieri commented 10 years ago

I am indeed! Still trying to figure it out too o_O

I'll be taking another look at it today and can get you more info then. I was hoping there would be some other associated exception, but I'm not seeing it in the logs.

talbright commented 10 years ago

Ok. Double check your config/initializers/rearview.rb as well, I think I saw something like this a long time ago and it ended up being configuration related.

tarcieri commented 10 years ago

It quite likely is, I'm just not sure what exactly is misconfigured

talbright commented 10 years ago

This might catch the problem:

$ rake RAILS_ENV=production rearview:config:verify

tarcieri commented 10 years ago

Will give that a try, thanks!

tarcieri commented 10 years ago

Well, that found a few things wrong, but now I get:

validating...PASSED

And it's still crashing with the same error.

talbright commented 10 years ago

Is your jdk and jruby version on the build matrix:

https://travis-ci.org/livingsocial/rearview-engine

tarcieri commented 10 years ago

We're using this:

jruby 1.7.3 (1.9.3p385) 2013-02-21 dac429b on Java HotSpot(TM) 64-Bit Server VM 1.7.0_17-b02 [linux-amd64]

I can try upgrading JRuby. In the meantime I can provide another pointer... things seem to be going amiss here:

https://github.com/livingsocial/rearview-engine/blob/master/lib/rearview/monitor_service.rb#L25

Here, @jobs.values is []

It seems that even trying to reference (i.e. print out or call) @supervisor crashes the program. If I comment out L25 and try to start the supervisor from the console:

irb(main):001:0> Rearview::MonitorSupervisor.run!
=> #<Celluloid::ActorProxy(Rearview::MonitorSupervisor:0x2cc0) @registry=#<Celluloid::Registry:0x7d26297b @registry_lock=#<Mutex:0x546d8a6d>, @registry={}> @members=[]>

Very strange...

talbright commented 10 years ago

Very strange! I seem to be guarding against nil and an empty [] correctly in the MonitorSupervisor. Which JDK (oracle, openjdk,etc)?

talbright commented 10 years ago

@tarcieri it does look like the build fails with jruby-1.7.2 (with openjdk{6,7}). It could be coincidence -- I'll have to dig more into it later

https://travis-ci.org/livingsocial/rearview-engine/builds/27228811

tarcieri commented 10 years ago

I seem to recall numerous bugs in earlier versions of JRuby and Celluloid ;). I'll try updating JRuby.

talbright commented 10 years ago

Cool...let me know how it goes. If its still busted I'll keep working with you on it until we figure it out.

tarcieri commented 10 years ago

Quick update: looks like upgrading JRuby fixed the problem.

We're still having some issues but they're unrelated to this.

talbright commented 10 years ago

:thumbsup:

I'm working on a vagrant setup, which will make it at least easier for people to test drive rearview.