kostya / eye

Process monitoring tool. Inspired from Bluepill and God.
MIT License
1.19k stars 86 forks source link

eye being launched as root once a day #123

Closed sentience closed 9 years ago

sentience commented 9 years ago

Apologies for posting here an issue that is almost certainly caused by user error.

We normally run eye daemon as the ‘deploy’ user on our Ubuntu Server EC2 instances; however, mysteriously, once a day eye daemon is also being launched as ‘root’ using the same config. When this happens, the two eye daemons run in parallel and interfere with each other (e.g. when the ‘deploy’ daemon restarts Unicorn, the ‘root’ daemon thinks it has crashed, and attempts to restart it, causing problems).

I have no idea what is causing the ‘root’ copy of eye daemon to launch. I’ve checked all the automated tasks on our instances (cron, etc.), and can find nothing that triggers this behaviour; and yet, like clockwork, once a day (seemingly around the same time, although I have yet to pin this down), a copy of eye running as ‘root’ is spawned.

Most mysteriously of all, this is only occurring on the servers for one of our applications. Another application, which uses the exact same Chef configuration code, is not experiencing this issue!

Anyone have any experience in seeing something like this happening? Any tips on isolating the cause?

Running eye 0.6.4 on Ruby 2.2.0.

For what it’s worth, this is my init script (/etc/init.d/myapp):

#!/bin/sh
#
# This file was generated by chef for ip-123-123-123-123.us-west-1.compute.internal
#
### BEGIN INIT INFO
# Provides: myapp
# Required-Start: $network $remote_fs $syslog $local_fs
# Required-Stop: $network $remote_fs $syslog $local_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: starts myapp by eye
### END INIT INFO

EYE=/opt/rbenv/shims/eye
CONFIG_FILE=/etc/eye/deploy/myapp.eye
SERVICE_NAME=myapp
RUNAS=deploy

execute() {
  CMD="$EYE $1 $2"
  if [ "$USER" = "$RUNAS" ]; then
    $CMD
  else
    su $RUNAS -c "$CMD"
  fi
}

case "$1" in
  start)
    echo -n "Loading eye configuration for $SERVICE_NAME"
    execute load $CONFIG_FILE
    execute start $SERVICE_NAME
    ;;
  stop)
    execute stop $SERVICE_NAME
    ;;
  restart)
    execute restart $SERVICE_NAME
    ;;
  status)
    execute info $SERVICE_NAME
    ;;
  *)
    echo "Usage: $0 {start|stop|restart|status}"
    exit 1
    ;;
esac

And this is the format of my eye config (/etc/eye/deploy/myapp.eye):

require File.join(File.dirname(__FILE__), "config.rb")

Eye.application "myapp" do

  env_vars = {
    PATH: "/opt/rbenv/shims:/opt/rbenv/bin:#{ENV['PATH']}",
    RAILS_ENV: "production",
    RBENV_ROOT: "/opt/rbenv",
  }
  env_vars["example"] = "xxx"

  env(env_vars)

  working_dir "/srv/myapp/current"
  uid "deploy"
  gid "deploy"

  process :unicorn do
    start_command "/usr/bin/env bundle exec unicorn /srv/myapp/current/config.ru -Dc /srv/myapp/shared/config/unicorn.rb"
    stop_signals [:TERM, 10.seconds]
    restart_command "kill -USR2 {PID}"
    stdall "/srv/myapp/shared/log/unicorn.log"
    pid_file "/srv/myapp/shared/pids/unicorn.pid"

    check :memory, every: 1.minute, below: 200.megabytes, times: [3, 5]

    start_timeout 1.minute
    restart_grace 1.minute

    trigger :flapping, times: 2, within: 1.minute, retry_in: 7.seconds

    monitor_children do
      stop_command "kill -QUIT {PID}"
      check :memory, every: 1.minute, below: 250.megabytes, times: [3, 5]
    end
  end
  process :sidekiq do
    start_command "/usr/bin/env bundle exec sidekiq --daemon -C /srv/myapp/current/config/sidekiq.yml"
    stop_signals [:TERM, 10.seconds, :KILL]
    pid_file "/srv/myapp/shared/pids/sidekiq.pid"

    check :memory, every: 5.minutes, below: 750.megabytes, times: [6, 8]

    start_timeout 1.minute
    restart_grace 1.minute

    trigger :flapping, times: 2, within: 30.seconds, retry_in: 7.seconds
  end
  process :allora do
    start_command "/usr/bin/env bundle exec ruby /srv/myapp/current/config/schedule.rb"
    pid_file "/srv/myapp/shared/pids/allora.pid"
    daemonize true
    stdout "/srv/myapp/shared/log/allora.log"
    stderr "/srv/myapp/shared/log/allora_error.log"
    stop_signals [:QUIT, 5.seconds, :TERM, 5.seconds, :KILL]

    check :memory, every: 1.minute, below: 200.megabytes, times: [3, 5]

    start_timeout 1.minute
    restart_grace 1.minute

    trigger :flapping, times: 2, within: 30.seconds, retry_in: 7.seconds
  end
end
kostya commented 9 years ago

to start eye as root, there should be someone to call load. Enable eye log, and you would see time, when new eye started, maybe some cron? Or may be this happens after deploy, which do load under root. Also add to you init script some log, when it call load, to log user name. Also not sure about su, we are usually using sudo -u deploy -H eye load ..., that properly set all env variables.

sentience commented 9 years ago

Thanks for your thoughts!

to start eye as root, there should be someone to call load.

Good to confirm that’s the only way it can end up running.

Enable eye log, and you would see time, when new eye started,

Thanks, I’ll give this a try! It may be slightly tricky to get the two copies of eye logging to different log files, but I’ll see what I can do.

maybe some cron?

I’m pretty sure I’ve eliminated actual cron tasks; however, I do have chef-client daemon running (as root) periodically, so that seems like the most likely source of the problem to me. It runs every 15 minutes, though, and the root eye process only seems to be spawning once a day, so I need to figure out why that is.

Or may be this happens after deploy, which do load under root.

That’s what I thought at first. I’ve triggered deploys and watched the results, and this doesn’t seem to spawn the extra eye process.

Also add to you init script some log, when it call load, to log user name.

Great idea; I’ll give this a shot.

Also not sure about su, we are usually using sudo -u deploy -H eye load ..., that properly set all env variables.

Hmm the init script I’m using comes from https://github.com/hamann/eye-chef-cookbook, but I’ll give it a try. If it makes a difference, I’ll submit a pull request to @hamann.

sentience commented 9 years ago

As suggested, I put some logging in the init script and I’ve now proven that the eye daemon running as root is not being launched via that script. Now looking into my chef-client logs so see if it’s launching eye daemon itself under some circumstances.

sentience commented 9 years ago

As I suspected, I traced this to user error. :stuck_out_tongue_closed_eyes:

My logrotate lastaction for rotating eye’s log file was incorrectly stopping (eye q) and restarting (eye l …) the eye daemon. Since this action ran as root, it was causing an instance of eye daemon to start as root.

I had already implemented proper log rotation by sending USR1 to the eye daemon (according to #113); however, I had neglected to deploy this configuration change to the this particular production environment.

Thanks for your help!