Closed innovia closed 6 years ago
Hi Konstantin
I have eye 0.9.2 that from time to time some processes and groups crashes without recovery.
I have to manually reboot the server / load eye where possible.
As you can see in my config i have hard limits on cpu and memory on all processes.
I even turned of the logging of EYE since I had a reason to believe its causing OOM killer to kill EYE.
I've also added ook killer nice commands to cron
*/1 * * * * root /usr/bin/pgrep -f "eye monitoring" | while read PID; do echo -17 > /proc/$PID/oom_adj; done */1 * * * * root /usr/bin/pgrep -f "redis-server" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
I need to stabilize this server. please help me out :)
here's my config file
#!/usr/bin/env ruby Eye.load '/etc/eye/mailer.rb' # mailer set params (like variables) Eye.load '/etc/eye/cloudwatch.rb' Eye.load '/etc/eye/config.rb' # config assign params values Eye.application :reDash do working_dir "/opt/redash/current" load_env "/root/.env" trigger :flapping, times: 3, within: 1.minute, retry_in: 5.minutes notify :by_email, :info notify :cloudwatch, :info process(:nginx) do depend_on :gunicorn pid_file "/var/run/nginx.pid" stdall "/opt/redash/logs/nginx.log" start_command "/usr/sbin/nginx" stop_signals [:QUIT, 30.seconds, :TERM, 15.seconds, :KILL] restart_command "kill -HUP {PID}" monitor_children do restart_command 'kill -TERM {PID}' check :memory, below: 200.megabytes, times: [3,5] end daemonize true end process(:redis) do pid_file "/var/run/redis.pid" stdall "/opt/redash/logs/redis.log" start_command "/usr/local/bin/redis-server /etc/redis/6379.conf" stop_signals [:TERM, 30.seconds, :QUIT] restart_command "kill -HUP {{PID}}" daemonize true end process(:gunicorn) do uid 'redash' gid 'nogroup' depend_on :redis pid_file "/var/run/gunicorn/gunicorn.pid" stdall "/opt/redash/logs/gunicorn.log" start_command "gunicorn -b unix:///var/run/gunicorn/gunicorn.sock --name redash -w 4 --max-requests 1000 redash.wsgi:app" stop_signals [:TERM, 30.seconds, :QUIT] restart_command "kill -HUP {{PID}}" daemonize true monitor_children do stop_command "kill -TERM {PID}" check :cpu, :every => 30, :below => 80, :times => 3 check :memory, :every => 30, :below => 250.megabytes, :times => [3,5] end end process(:flower) do uid 'redash' gid 'nogroup' pid_file "/var/run/celery/flower.pid" stdall "/opt/redash/logs/flower.log" start_command "celery flower -A redash.worker --address=0.0.0.0 --persistent" stop_signals [:TERM, 30.seconds, :QUIT] restart_command "kill -HUP {{PID}}" check :cpu, :every => 30, :below => 80, :times => 3 check :memory, :every => 30, :below => 250.megabytes, :times => [3,5] daemonize true end process(:celery_worker) do uid 'redash' gid 'nogroup' pid_file "/var/run/celery/celery_worker.pid" stdall "/opt/redash/logs/celery_worker.log" start_command "celery worker --app=redash.worker --beat -c2 -Qqueries,celery --maxtasksperchild=10 -Ofair --autoscale=6,3 -n redash_celery_worker@%h" stop_signals [:TERM, 30.seconds, :QUIT] restart_command "kill -HUP {{PID}}" daemonize true monitor_children do stop_command "kill -TERM {PID}" check :cpu, :every => 30, :below => 80, :times => 3 check :memory, :every => 30, :below => 400.megabytes, :times => [3,5] end end process(:celery_schedule_worker) do uid 'redash' gid 'nogroup' pid_file "/var/run/celery/celery_schedule_worker.pid" stdall "/opt/redash/logs/celery_schedule_worker.log" start_command "celery worker --app=redash.worker -c2 -Qscheduled_queries --maxtasksperchild=10 -Ofair --autoscale=2,4 -n redash_celery_scheduled@%h" stop_signals [:TERM, 30.seconds, :QUIT] restart_command "kill -HUP {{PID}}" daemonize true monitor_children do stop_command "kill -TERM {PID}" check :cpu, :every => 30, :below => 80, :times => 3 check :memory, :every => 30, :below => 400.megabytes, :times => [3,5] end end end
are you enable Eye logger? it write to it every things that happen. you probably can find a problem here.
Ill try. And update you
i'll re-open this if the issue persist
Hi Konstantin
I have eye 0.9.2 that from time to time some processes and groups crashes without recovery.
I have to manually reboot the server / load eye where possible.
As you can see in my config i have hard limits on cpu and memory on all processes.
I even turned of the logging of EYE since I had a reason to believe its causing OOM killer to kill EYE.
I've also added ook killer nice commands to cron
I need to stabilize this server. please help me out :)
here's my config file