kostya / eye

Process monitoring tool. Inspired from Bluepill and God.
MIT License
1.19k stars 89 forks source link

Eye monitors wrong processes PIDs after config reload #228

Open migalenkom opened 5 years ago

migalenkom commented 5 years ago

I have eye process group running

bundle exec eye i
recorder                   
  5bfc56a23d1f922e4a7a8046         
    detector ...................... up  (14:14, 6%, 64Mb, <1567>)
    live .......................... up  (14:14, 1%, 15Mb, <1347>)
    live_sub ...................... up  (14:14, 0%, 13Mb, <1359>)
    proxy ......................... up  (14:14, 0%, 3Mb, <1280>)
    proxy2 ........................ up  (14:14, 1%, 3Mb, <1294>)
    recorder ...................... up  (14:14, 1%, 16Mb, <1566>)

Then I want to add process group and reload gonfig

bundle exec eye load recorder.eye
Config loaded!
bundle exec eye i
recorder                   
  5ba80f973d1f9235ef940bc4         
    detector ...................... up  (14:14, 7%, 64Mb, <1567>)
    live .......................... up  (14:14, 1%, 15Mb, <1347>)
    live_sub ...................... up  (14:14, 0%, 13Mb, <1359>)
    proxy ......................... up  (14:14, 0%, 3Mb, <1280>)
    proxy2 ........................ up  (14:14, 1%, 3Mb, <1294>)
    recorder ...................... up  (14:14, 1%, 17Mb, <1566>)
  5bfc56a23d1f922e4a7a8046         
    detector ...................... up  (14:14, 7%, 64Mb, <1567>)
    live .......................... up  (14:14, 1%, 15Mb, <1347>)
    live_sub ...................... up  (14:14, 0%, 13Mb, <1359>)
    proxy ......................... up  (14:14, 0%, 3Mb, <1280>)
    proxy2 ........................ up  (14:14, 1%, 3Mb, <1294>)
    recorder ...................... up  (14:14, 1%, 17Mb, <1566>)

It adds a new process group with the same PIDs, same start time, etc., then next check it detects that both processes are crashed and restarts both. After that restart process, PIDs are correct.

 bundle exec eye i
recorder                   
  5ba80f973d1f9235ef940bc4         
    detector ...................... up  (14:18, 5%, 62Mb, <12620>)
    live .......................... up  (14:18, 1%, 16Mb, <12553>)
    live_sub ...................... up  (14:18, 0%, 13Mb, <12559>)
    proxy ......................... up  (14:18, 0%, 3Mb, <12537>)
    proxy2 ........................ up  (14:18, 2%, 4Mb, <12539>)
    recorder ...................... up  (14:18, 1%, 19Mb, <12629>)
  5bfc56a23d1f922e4a7a8046         
    detector ...................... up  (14:18, 7%, 64Mb, <12647>)
    live .......................... up  (14:18, 1%, 15Mb, <12515>)
    live_sub ...................... up  (14:18, 0%, 13Mb, <12560>)
    proxy ......................... up  (14:18, 0%, 3Mb, <12387>)
    proxy2 ........................ up  (14:18, 1%, 3Mb, <12398>)
    recorder ...................... up  (14:18, 1%, 17Mb, <12644>)

If you have more than two processes it restarts in a chain by groups of two

kostya commented 5 years ago

after adding, you have two processes with same pids (1567):

  5ba80f973d1f9235ef940bc4         
    detector ...................... up  (14:14, 7%, 64Mb, <1567>)
  5bfc56a23d1f922e4a7a8046         
    detector ...................... up  (14:14, 7%, 64Mb, <1567>)

so maybe you set the same pidfile for both processes

migalenkom commented 5 years ago

Checked again, but PID files are separate for each process. (e.g)

detector-5ba80f973d1f9235ef940bc4.pid

detector-5bfc56a23d1f922e4a7a8046.pid
kostya commented 5 years ago

are you use daemonize: true?

migalenkom commented 5 years ago

Yes, I am using it.

kostya commented 5 years ago

looks like impossible, btw, you can see all what happens in eye log.

migalenkom commented 5 years ago

https://www.dropbox.com/s/nqmjh8vtlepoayb/screencast_00007.mp4?dl=0

2019-07-13 11:51:40.804166 W [24206:70315359510820] eye -- [recorder:5ba80f973d1f9235ef940bc4:live] check_alive: pid_file (/data/deployer/timeagent/tmp/pids/recorder/live-5ba80f973d1f9235ef940bc4.pid) changed by itself (<24221> => <24146>), reverting to <24221> (the pid_file is controlled by eye)
kostya commented 5 years ago

this line is ok, eye find that file was changed outside. try also restart eye.

migalenkom commented 5 years ago

Loading the config again starts a completely independent group of new processes while keeping the old processes alive -- but for some reason, the new processes show the same PIDs as the previous group, which is not correct.

The new processes when daemonized do not have same PIDs, something is going wrong internally with the monitoring threads.