mojombo / god

Ruby process monitor
http://godrb.com
MIT License
2.21k stars 534 forks source link

The server is not available (or you do not have permissions to access it) #1

Open gdi opened 15 years ago

gdi commented 15 years ago

We see this error about 1/3rd of the time when trying to restart god (after, say, disabling it via god terminate); all god commands save 'god check' will report this. What are some of the causes of this error?

We also see instances where a watch with both a :process_running and a :cpu_usage condition will fail to restart a process which is no longer running. Checking the god log for that watch, we see a large list of successful status checks for cpu usage -- 0%, naturally.

Are we missing some basic permissions issue here? God is running as root, so it seems unlikely, but...

mojombo commented 15 years ago

What command are you using to restart god? That error occurs when the god daemon isn't running or the user under which you're running the god query does not have permission to the socket file which is owned by the process that was used to start god.

Can you split the other question into a separate ticket, doubling up tickets like this makes things difficult to track and address.

dylanz commented 14 years ago

I'm also running into the intermittent failures. I'm running Version 0.7.18. The error reported is: "The server is not available (or you do not have permissions to access it)".

I have God setup as a service on CentOS. God is run as root. Here is the script:

case $1 in
start)  
        /opt/local/bin/god -l /home/xxxxxx/current/log/god.log -P /home/xxxxxx/current/tmp/pids/god.pid || echo -en "\n already running"
        for file in `ls -1 /home/xxxxxx/current/config//*.god`; do /opt/local/bin/god load ; done
        echo "started"
        ;;
stop)   
        kill -QUIT `cat /home/xxxxx/current/tmp/pids/god.pid` || echo -en "\n not running"
        echo "stopped"
        ;;
restart|reload)
        kill -HUP `cat /home/xxxxx/current/tmp/pids/god.pid` || echo -en "\n can't reload"
        echo "restarted"
        ;;
*)
        echo >&2 "Usage: /var/spool/ec2/rs_cache/attachments/script_159507 "
        exit 1
        ;;
esac
~    
mikewadhera commented 14 years ago

Hey Tom- do you know the current state of this issue? Have been seeing this behavior on Debian boxes after several days of normal operation with 0.7.18. Looked at the release history file but didn't see anything obvious, wondering if 0.8 fixed this but thought i'd check once before upgrading. Thanks

mikewadhera commented 14 years ago

Ah, found out why we were seeing this behavior...

God stores a socket (.sock) file in /tmp which it uses to communicate with the server. A few weeks ago, we added a crontab entry to cleanup stale files in /tmp -- we'd wipe any files with modified dates greater than 2 days:

50 22 * * * find /tmp -mtime +2 -print | xargs rm -rf

This started sweeping the socket file 48 hours after startup, causing the CLI to report the "server is unavailable" message in (seemingly) intermittent intervals. The solution was to target specific stubborn files, like open-uri handles when sweeping /tmp. HTH others.

eric commented 14 years ago

Is anyone else still seeing these problems (where the file still exists)?

mojombo commented 14 years ago

I'm guessing a number of people probably run into this problem due to cleaning out the /tmp directory, and in retrospect, /tmp seems like a horrible place for the communication socket to live. Upon looking at the Filesystem Hierarchy Standard (http://www.pathname.com/fhs/pub/fhs-2.3.html), the socket file should really live in /var/run/god. I'll slate this change to go out in a future release after announcing it on the mailing list.

eric commented 14 years ago

That makes a lot of sense.

epipheus commented 13 years ago

Was this ever fixed?