laravel / horizon

Dashboard and code-driven configuration for Laravel queues.
https://laravel.com/docs/horizon
MIT License
3.87k stars 657 forks source link

Horizon Orphan Process #178

Closed dbpolito closed 6 years ago

dbpolito commented 7 years ago

I'm running horizon on a latest forge machine as documented... the daemon with php artisan horizon and on deployments php artisan horizon:terminate but time to time i need to manually run php artisan horizon:purge.

This is the output i just got after after hours of last release:

$ php artisan horizon:purge
Observed Orphan: 17043
Observed Orphan: 30634
Observed Orphan: 31084
Observed Orphan: 31085

I can confirm it's orphans by running htop on tree mode (press f5) and i see these process as root process, not inside the master php artisan horizon process.


And also, every time i run purge, it ALWAYS wrongly see 1 process as orphan:

$ php artisan horizon:purge
Observed Orphan: 17059

I haven't found a pattern yet why this is happening.

tomschlick commented 6 years ago

Just deployed 1.2.2 to our production environment at work. Will report back what we find over the next few days.

marianvlad commented 6 years ago

@taylorotwell After 1.2.2, everything works perfectly. No more orphan jobs and old code.

taylorotwell commented 6 years ago

@tomschlick you see anything? @marianvlad thanks!

tomschlick commented 6 years ago

@taylorotwell just checked our logs from last week and we saw two instances of orphans

Sending TERM Signal To Process: 9750
Observed Orphan: 8781
Observed Orphan: 8862
Observed Orphan: 10459
Sending TERM Signal To Process: 10498
Observed Orphan: 8781
Observed Orphan: 8862
Observed Orphan: 12571

Weirdly enough two of the processes had the same process id, even though the deploys took place 15 minutes apart. Horizon appeared to terminate them and restart correctly so not sure how that's possible 🤷‍♂️

themsaid commented 6 years ago

Just a notice, the horizon:purge command doesn't work as expected, so if you run it and get a single rogue process ignore it, it's not an indicator. Only if you get multiple process it means there are actual orphans.

So in this case the two orphans 8781 and 8862 are the actual orphans, however it seems that even the purge command didn't kill them so that could mean they're really stuck on a long process that the next loop didn't run yet.

What's your timeout value?

tomschlick commented 6 years ago

Timeout value is 1800 for most of our workers.

dbpolito commented 6 years ago

I'm not able to reproduce this issue anymore... So it looks like fixed to me... ❤️

dbpolito commented 6 years ago

As i created the ticket and seems it got fixed, i'm closing this one... We can start new tickets and mention this one if necessary.

vyuldashev commented 6 years ago

We are encountering strange issues. Sometimes one queue is stuck and does not process any jobs. Even when supervisor is stopped there are horizon:work processes in the list. horizon:purge also does not help as it does not find any.

Here is our config and the problem is only with default:

»> config('horizon')
=> [
     "use" => "queue",
     "prefix" => "horizon:",
     "waits" => [
       "redis:default" => 300,
     ],
     "trim" => [
       "recent" => 60,
       "failed" => 10080,
     ],
     "environments" => [
       "production" => [
         "supervisor-1" => [
           "connection" => "redis",
           "queue" => [
             "default",
           ],
           "balance" => "simple",
           "processes" => 10,
           "tries" => 0,
         ],
         "supervisor-2" => [
           "connection" => "redis",
           "queue" => [
             "sms",
             "phone_data",
           ],
           "balance" => "auto",
           "processes" => 10,
           "tries" => 0,
         ],
       ],
       "local" => [
         "supervisor-1" => [
           "connection" => "redis",
           "queue" => [
             "default",
           ],
           "balance" => "simple",
           "processes" => 3,
           "tries" => 3,
         ],
       ],
     ],
   ]