kostya / eye

Process monitoring tool. Inspired from Bluepill and God.
MIT License
1.19k stars 89 forks source link

Exception: undefined method `groups' for nil:NilClass (lib/eye/controller/helpers.rb) #230

Open rgaufman opened 4 years ago

rgaufman commented 4 years ago

One of my servers was misbehaving, so I sshed into it, checked the logs and found the following:

2019-11-07 19:38:30.161668 E [21863:70056329331000 logger.rb:53] eye -- [recorder_5ca06cfa4caa415504800808:5ca06cfa4caa415504800808:live_sub] trigger(wait_dependency) Exception: undefined method `groups' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/controller/helpers.rb:37:in `find_nearest_process'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `public_send'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `dispatch'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:16:in `dispatch'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/cell.rb:50:in `block in dispatch'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/cell.rb:76:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'", "(celluloid):0:in `remote procedure call'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:45:in `value'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:22:in `method_missing'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/trigger/wait_dependency.rb:16:in `block in wait_dependency'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/trigger/wait_dependency.rb:15:in `map'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/trigger/wait_dependency.rb:15:in `wait_dependency'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/trigger/wait_dependency.rb:9:in `check'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/trigger.rb:73:in `notify'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/trigger.rb:12:in `block in check_triggers'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/trigger.rb:12:in `each'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/trigger.rb:12:in `check_triggers'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/eval_helpers.rb:58:in `evaluate_method'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/callback.rb:192:in `block in run_methods'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/callback.rb:191:in `each'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/callback.rb:191:in `run_methods'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/callback.rb:159:in `call'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:394:in `block (2 levels) in after'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:394:in `each'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:394:in `block in after'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:392:in `catch'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:392:in `after'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:195:in `run_callbacks'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition_collection.rb:127:in `run_callbacks'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition_collection.rb:63:in `block (2 levels) in perform'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition_collection.rb:63:in `catch'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition_collection.rb:63:in `block in perform'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition_collection.rb:182:in `block in within_transaction'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:171:in `block in within_transaction'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/machine.rb:1868:in `block in within_transaction'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/machine.rb:2164:in `transaction'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/machine.rb:1868:in `within_transaction'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:170:in `within_transaction'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition_collection.rb:181:in `within_transaction'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition_collection.rb:62:in `perform'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/transition.rb:163:in `perform'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/event.rb:155:in `fire'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/event.rb:219:in `block in add_actions'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/machine.rb:729:in `block (2 levels) in define_helper'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/event.rb:224:in `block in add_actions'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/state_machines-0.5.0/lib/state_machines/machine.rb:729:in `block (2 levels) in define_helper'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/states.rb:11:in `switch'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/commands.rb:6:in `start_process'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/commands.rb:81:in `restart_process'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/controller.rb:22:in `restart'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/scheduler.rb:74:in `scheduler_consume'", "/data/deployer/timeagent/vendor/cache/eye-ff98cd440391/lib/eye/process/scheduler.rb:126:in `scheduler_run'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `public_send'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `dispatch'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/async.rb:7:in `dispatch'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/cell.rb:50:in `block in dispatch'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/cell.rb:76:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]

Any ideas what could be causing this and how to prevent in the future?

kostya commented 4 years ago

Looks like really rare concurrent situation, when in parallel application was deleted and in that time triggered depend_on on another. I would think how to avoid this.

rgaufman commented 4 years ago

Thank you, indeed we recently switched to having a separate application per camera, before we used to have a single application for all cameras but it meant needing to restart everything when we made changes.

rgaufman commented 4 years ago

Any update? :)

kostya commented 4 years ago

i not see how to reproduce it and how this can happen, i can only imagine this is celluloid bug. Is it happen again?

rgaufman commented 4 years ago

Yes, it happens every few days on one of the 900 or so servers we have :( - So much so, I added a separate process that just tails the eye logs every minute looking for "Exception: undefined method `groups' for nil:NilClass" and restarts eye if it finds this line. It only happens when stopping and starting eye "applications".

kostya commented 4 years ago

Can you show how config look like schematically?

kostya commented 4 years ago

i pushed 0.10.1.pre with simple not nil check, maybe it helps, still not understand how this can happen. can you try it?

rgaufman commented 4 years ago

The config is a big complex, we use eye to start/stop all the processes required to record an IP camera, this includes proxy process, live view, detector, etc.

Before, we used to just stop eye completely, then start eye again. This worked, but meant if someone edited the settings of 1 camera, all cameras had to be restarted.

We changed the implementation to report a list of camera IDs that changed and then we run

system("bundle exec eye delete camera_#{id}")

Once all the camera IDs were stopped and deleted, we run:

bundle exec eye load ../recorder.eye

Which ensures all cameras are started, internally it gets a list of IDs using a simple get request to the rails app running locally.

Maybe this is the wrong way of doing it?

I'll try with 0.10.1.pre, thank you!

kostya commented 4 years ago

Your usage is quite strange, usually not need to call delete, only load command with updated config. It should delete and stop deleted from configs processes (if process have stop_on_delete true). And then you call restart for required processes.

rgaufman commented 4 years ago

Reason we need to delete is sometimes the camera configuration changes, so it's not just ensuring all cameras are loaded, but sometimes we need to delete the camera and then load again to make sure that camera is loaded back. We have stop_on_delete true already :)