kostya / eye

Process monitoring tool. Inspired from Bluepill and God.
MIT License
1.19k stars 85 forks source link

depend_on's ordering in stopping case #227

Closed respire closed 5 years ago

respire commented 5 years ago

hello! thanks for this great gem! i want to monitor kafka process via eye. in standalone setup, it consists of 1 zookeeper and 1 broker. we should start zookeeper first, then broker. I use the DSL depend_on to achieve this goal, and it works as expected. but when I call stop all, the broker cannot be stopped properly. because zookeeper has already been stopped before broker is fully down. I check eye's code, and find that it did schedule a stop command to broker. but it didn't wait until broker was fully stopped. I'm not sure how to solve this problem. anyone has idea?

here's my config BTW

require 'fileutils'

CWD = File.dirname(__FILE__)
EYE_LOG_PATH = File.expand_path('logs/eye.log', CWD)
FileUtils.touch EYE_LOG_PATH

Eye.config { logger EYE_LOG_PATH, 'daily', 134_217_728 }

Eye.application('kafka') do
  working_dir CWD

  group('standalone') do
    chain grace: 5.seconds
    process(:zookeeper) do
      start_command 'bin/zookeeper-server-start.sh config/zookeeper.properties'
      pid_file 'tmp/pids/zookeeper.pid'
      stdall File.expand_path('logs/zookeeper.log', CWD)
      daemonize true
      stop_signals [:TERM, 1.minute, :KILL]
    end
    process(:broker) do
      start_command 'bin/kafka-server-start.sh config/server.properties'
      pid_file 'tmp/pids/broker.pid'
      stdall File.expand_path('logs/broker.log', CWD)
      daemonize true
      depend_on [:zookeeper]
      stop_signals [:TERM, 15.minutes, :KILL]
    end
  end
kostya commented 5 years ago

depend_on is a handy syntax for set of triggers which added for both processes with default behaviour (in default parent not wait child to stop), you can add your own trigger, example: (this is just draft, need to check or debug it)

class Eye::Trigger::WaitChildToStop < Eye::Trigger
  param :child_name, String
  param :wait_timeout, [Numeric], nil, 15.seconds

  def check(transition)
    wait_until_stop if transition.event == :stopping
  end

private

  def wait_until_stop
    child = Eye::Control.find_nearest_process(child_name, process.group_name_pure, process.app_name)
    return unless child

    process.wait_for_condition(wait_timeout, 0.5) do
      info "wait for #{child_name} until it :down"
      (child.state_name == :unmonitored) || (child.state_name == :down)
    end
  end
end

CWD = File.dirname(__FILE__)
EYE_LOG_PATH = File.expand_path('logs/eye.log', CWD)
FileUtils.touch EYE_LOG_PATH

Eye.config { logger EYE_LOG_PATH, 'daily', 134_217_728 }

Eye.application('kafka') do
  working_dir CWD

  group('standalone') do
    chain grace: 5.seconds
    process(:zookeeper) do
      start_command 'bin/zookeeper-server-start.sh config/zookeeper.properties'
      pid_file 'tmp/pids/zookeeper.pid'
      stdall 'logs/zookeeper.log'
      daemonize true
      trigger :wait_child_to_stop, :child_name => :broker
      stop_signals [:TERM, 1.minute, :KILL]
    end
    process(:broker) do
      start_command 'bin/kafka-server-start.sh config/server.properties'
      pid_file 'tmp/pids/broker.pid'
      stdall 'logs/broker.log'
      daemonize true
      depend_on [:zookeeper]
      stop_signals [:TERM, 15.minutes, :KILL]
    end
  end

btw, you not need to write this stdall File.expand_path('logs/broker.log', CWD), it is already expand with working_dir, stdall 'logs/broker.log'

respire commented 5 years ago

Fast response! As you advised, I write a custom trigger and it works now. Thanks! I also change stdall's argument to use relative path now. here's the implementation.

class Eye::Trigger::RequiredBy < Eye::Trigger::Custom
  param :names, [Array], true
  param :wait_timeout, [Numeric], nil, 15.seconds

  def check(transition)
    wait_required_by_process if transition.to_name == :stopping
  end

  private

  def wait_required_by_process
    processes = names.map do |name|
      Eye::Control.find_nearest_process(name, process.group_name_pure, process.app_name)
    end.compact.reject { |p| p.state_name == :unmonitored || p.state_name == :down }

    return if processes.empty?

    processes = Eye::Utils::AliveArray.new(processes)

    res = true

    processes.pmap do |p|
      name = p.name

      res &= process.wait_for_condition(wait_timeout, 0.5) do
        info "wait for #{name} in #{wait_timeout}s until it :down or :unmonitored. now it's #{p.state_name}"
        p.state_name == :down || p.state_name == :unmonitored
      end
    end

    unless res
      warn "#{names} are not transition to :unmonitored"
    end
  end
end

For convenience, I also patch depend_on

module ProcessDSLRequiredBySupport
  def depend_on(names, opts = {})
    super(names, opts)

    nm = @config[:name]
    names.each do |name|
      parent.process(name) do
        trigger("required_by_#{unique_num}", opts.merge(names: [nm]))
      end
    end
  end
end

Eye::Dsl::ProcessOpts.send(:prepend, ProcessDSLRequiredBySupport)