Closed szokeptr closed 5 years ago
Hey there,
Unfortunately we don't support this version anymore. Please check out our support policy on which versions we are currently supporting. Can you please try to upgrade to the latest version and see if your problem persists? We'll help you out and re-open this issue if so.
Thanks!
@driesvints I can confirm this issue exists on Laravel v5.8.18 and Horizon v3.2.1 (on a fresh installation):
Peter-MacBook-Pro:~ szokeptr$ ps | grep horizon
9488 ttys004 0:00.00 grep horizon
9403 ttys005 0:00.18 php artisan horizon
9407 ttys005 0:00.17 /usr/local/Cellar/php/7.3.4/bin/php artisan horizon:supervisor peter-macbook-prolocal-4bIg:supervisor-1 redis --delay=0 --memory=128 --queue=default --sleep=3 --timeout=60 --tries=3 --balance=simple --max-processes=3 --min-processes=1 --nice=0
9424 ttys005 0:00.16 /usr/local/Cellar/php/7.3.4/bin/php artisan horizon:work redis --delay=0 --memory=128 --queue=default --sleep=3 --timeout=60 --tries=3 --supervisor=peter-macbook-prolocal-4bIg:supervisor-1
9425 ttys005 0:00.16 /usr/local/Cellar/php/7.3.4/bin/php artisan horizon:work redis --delay=0 --memory=128 --queue=default --sleep=3 --timeout=60 --tries=3 --supervisor=peter-macbook-prolocal-4bIg:supervisor-1
9426 ttys005 0:00.15 /usr/local/Cellar/php/7.3.4/bin/php artisan horizon:work redis --delay=0 --memory=128 --queue=default --sleep=3 --timeout=60 --tries=3 --supervisor=peter-macbook-prolocal-4bIg:supervisor-1
Peter-MacBook-Pro:~ szokeptr$ kill 9407
Peter-MacBook-Pro:~ szokeptr$ ps | grep horizon
9550 ttys004 0:00.00 grep horizon
9403 ttys005 0:00.19 php artisan horizon
Peter-MacBook-Pro:~ szokeptr$
AFAIK I don't think Horizon can do this so I'll mark this as a feature request.
If I kill a worker process in the same way a new one is started, so I think Horizon is has the ability to do this. If I figure it out I'll be happy to send a PR.
It seems that we don't restart on purpose for the exit codes 0, 2 and 13: https://github.com/laravel/horizon/blob/a9204280f72a1c6d3874e0f98b68e354b212311a/src/SupervisorProcess.php#L38-L42
But I don't know why tbh.
It seems that 0 & 2 are exit codes where a Supervisor on Linux doesn't restarts for so it's wanted for us to ignore these. Which exist code is returned when you kill the process?
It’s hard to reproduce, but I will try to put together a minimal setup to demonstrate and check the exit code.
@szokeptr Are you using Supervisord per the docs? (https://laravel.com/docs/5.8/horizon#deploying-horizon)
@lonnylot yes, of course. The problem is that horizon ignores the death of the supervisor process (that is launched by it) and because of this no worker processes are active for that supervisor anymore. This is an extremely rare edge-case as the supervisor process has no reason to crash, in my case the root cause was that the system hit the max open files limit and PHP was not able to start properly.
I still think that this is a valid issue, and something should be done so I will definitely investigate it a bit more as soon as I have time - I will close this until then.
+1 here. I get the message "A supervisor with this name is already running", then "INFO reaped unknown pid [$pid]" and then the horizon supervisors are no longer shown in the horizon dashboard.
Deploying to a Kubernetes cluster I am facing this issue. I could not yet defined what exactly is killing the supervisor, but it eventually dies and won't return. We have other supervisors up and running in the container.
It may be related to resources limitation but we are still trying to pin point that.
After debugging, researching and suffering I gave up on Horizon in this version of Laravel (mainly because the maintainers' response to this and other bugs is (sic) "we no longer support this version of Laravel") and used a combination of supervisord
processes and php artisan queue:listen
. This solved all my problems with memory leak and allocation.
Here is my horizon.conf
supervisord configuration file:
[supervisord]
nodaemon=true
[program:horizon]
process_name=%(program_name)s_%(process_num)02d
command=php /srv/app/artisan queue:listen --queue=<queues name>
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/dev/fd/1
stdout_logfile_maxbytes=0
killasgroup=true
stopasgroup=true
numprocs=4
Identified that when PHP Horizon supervisor process was killed due to PHP's memory limit, right after it Horizon would say that the supervisor is already running. That is likely a race condition on how to tell if the supervisor is actually running or not.
@fabriciojs indeed, a related problem is, sometimes, Horizon workers may become zombie processes, consuming a lot of memory, which would lead your machine to run out of memory
It is also common that containers are killed by the OOMkiller when consuming too much memory. In case of Kubernetes, in such case only the container is restarted while the pod remains the same, which means the hostname remains the same. This leads to the mentioned race condition.
Description:
It looks like Horizon is not detecting if supervisor processes are killed and this results in no worker processes running.
I am not sure if this is intended behavior or a bug, but it is really an issue for me. For example supervisor processes can die when the OS has a low max file descriptor setting, which is obviously a different issue, but it would be great if horizon could handle this case too.
I am now considering to switching back to regular Laravel workers, as this is not an issue when you are using
supervisord
to supervise the workers.Steps To Reproduce:
php artisan horizon
ps aux | grep horizon:supervisor
kill PID