Open brandondrew opened 4 years ago
My config file (with the change for simplifying testing)
def sidekiq_process(proxy, name)
proxy.process(name) do
#start_command "bundle exec sidekiq -C ./config/sidekiq.yml -e #{ENV['RAILS_ENV']}"
start_command "bundle exec sidekiq -C ./config/sidekiq.yml -e production"
pid_file "tmp/pids/#{name}.pid"
stdall "log/#{name}.log"
daemonize true
# trigger :flapping, times: 10, within: 1.minute, retry_in: 10.minutes
stop_signals [:TSTP, 0, :TERM, 10.seconds, :KILL]
start_grace 10.seconds
start_timeout 90.seconds
stop_timeout 30.seconds
# check :cpu, every: 30, below: 100, times: 5
# check :memory, every: 30, below: 300.megabytes, times: 5
end
end
Eye.application "sidekiq" do
working_dir File.expand_path('../..', File.join(File.dirname(__FILE__)))
sidekiq_process self, :sidekiq
end
Eye.config do
current = File.expand_path('../..', File.join(File.dirname(__FILE__)))
logger "#{current}/log/eye.log"
end
looks like your ~/.eye folder can not be writable, or there some old pid, try to delete it.
The original problem went away after rebooting the server. (I never thought I would see that happen on Linux.) But it came back several weeks later, so I tried troubleshooting again. I found that using nohup
to put eye
in the background instead instead of letting eye
do it itself is one workaround (which unfortunately makes things more complicated, and by default logs to nohup.out
and is therefore not an ideal solution):
nohup bundle exec eye load -f /var/www/rsp/current/lib/eye &
I then returned here and read your reply, and found that you were exactly right about the old pid file needing to be deleted.
Why would an old pid be left around? Do you have any ideas?
Besides a bug, the only theory I could think of is that someone has not been cleanly rebooting the server. I do have confirmation that the system administrator sometimes reboots from the AWS console rather than the shell, although I don't know if that would be a problem or not.
Is it possible to make eye (when started up) not simply believe it is already running based on the presence of the pid file, but also verify this by checking for a running process? That would avoid the situation I found myself in.
In the meantime, I have written a script that I trigger with cron
:
#!/usr/bin/env bash
# This addresses the intermittent problem of eye not cleaning up its old PIDs....
# An old PID prevents eye from starting up normally in the background.
APP_DIR="/var/www/rsp/current"
EYE_LOG="${APP_DIR}/log/eye.log"
EYE_PID=$(cat $HOME/.eye/pid 2> /dev/null) # get the PID
kill -0 ${EYE_PID} 2> /dev/null # return an error if PID is missing; this won't kill anything
ERROR=$(echo $?) # get the error code
if [[ ${ERROR} != "0" ]]; then # if the error code indicates eye was not running, then start it up
echo "ERROR: eye is not running... attempting to start..." >> "$EYE_LOG"
rm -f $EYE_PID
export PATH="$HOME/.rbenv/bin:$PATH"
eval "$(rbenv init -)"
{ date; cd /var/www/rsp/current; bundle exec eye load /var/www/rsp/current/lib/eye; date; } >> "${EYE_LOG}" 2>&1
else
echo "SUCCESSFULLY VERIFIED THAT eye IS RUNNING. " >> "${EYE_LOG}"
fi
These are my crontab entries:
@reboot /var/www/rsp/current/scripts/ensure_eye_is_running.sh
* * * * * /var/www/rsp/current/scripts/ensure_eye_is_running.sh
I hope my solution is helpful to someone, but I think ideally eye
itself would do this very same thing when it starts up.
This works (note the
-f
):Without
-f
it FAILS:To test whether this was because it somehow lost the environment variable when run in the background, I hardcoded the value 'production' in the lib/eye/sidekiq.eye:
and it was still instantly killed.
When I watch the sidekiq log I don't see anything appear at all, unless I use the -f flag with eye load. And when I tail the eye log, there is likewise nothing that appears. So there's no log output that gives me a clue as to what's going on.
Checking the configuration, eye tells me it's fine: