laravel / horizon

Dashboard and code-driven configuration for Laravel queues.
https://laravel.com/docs/horizon
MIT License
3.85k stars 651 forks source link

[5.x] Ensure graceful termination of workers marked for termination #1433

Closed tarexme closed 4 months ago

tarexme commented 5 months ago

Fix #1432

Description

There appears to be an issue where workers marked for termination while processing jobs do not terminate gracefully when horizon:terminate is subsequently invoked. These workers, while still actively running, are overlooked during the supervisor's termination process. As a result, instead of terminating gracefully, they are killed upon the supervisor's exit.

Steps To Reproduce

  1. Launch the master supervisor with the fast_termination option set to false using horizon command.
  2. Send a long-running job to the queue. Ensure that this job is being processed by a worker.
  3. Wait for scaleDown() method to be triggered on ProcessPool, ensuring that the process handling the long-running job is marked for termination. For consistent test results, use the code snippet below to simulate a supervisor restart during which all worker processes are marked for termination by scaling process pools down to 0.
  4. Terminate horizon using horizon:terminate command.
// Dispatch a long-running job that sleeps for 60 seconds
SleepJob::dispatch(60);
sleep(10); // Make sure that job is picked up

// Trigger a restart on all supervisors, marking workers for termination
foreach (app(SupervisorRepository::class)->names() as $name) {
    app(HorizonCommandQueue::class)->push(
        $name, Restart::class
    );
}

sleep(10); // Make sure that all supervisors have restarted

// Call the terminate command
Artisan::call(TerminateCommand::class);
class SleepJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function __construct(protected readonly int $sleepDuration)
    {
        // Nothing
    }

    /**
     * Execute the job.
     */
    public function handle(): void
    {
        for ($i = 0; $i < $this->sleepDuration; $i++) {
            sleep(1);
        }

        Log::debug('Job finished.');
    }
}
driesvints commented 5 months ago

Please add a thorough description to your PR and not just link to an issue. This will help the people reviewing your PR.

art-vanesyan commented 4 months ago

are there any updates? the same situation

driesvints commented 4 months ago

@taylorotwell going to re-open this one as we've had two reports now of graceful termination not working properly with Horizon. This PR seems to fix it for both cases. Other one here: https://github.com/laravel/horizon/issues/1450

nckrtl commented 4 months ago

To add to this remark: https://github.com/laravel/horizon/issues/1450#issuecomment-2132934935 this only seems to be true in the case of 1 job being processed on termination. When it doesn't occur, the worker is included in the runningProcesses collection. So although this PR fixes it, it feels like a patch in the wrong place as it feels that the issue lies deeper.

$this->terminatingProcesses() should also not be relevant in this case as its only being used when scaling down processes. I think the real issue lies within the scale() method in ProcessPool.php. When there is 1 process idle and a job is being pushed to the queue, the scale function is being called:

    public function scale($processes)
    {
        $processes = max(0, (int) $processes);

        if ($processes === count($this->processes)) {
            return;
        }

        if ($processes > count($this->processes)) {
            $this->scaleUp($processes);
        } else {
            $this->scaleDown($processes);
        }
    }

At the moment of that scale check $this->processes is 2 as another process has already been added once the job has been added to the queue. So then scaleDown is called. And in scaleDown it takes the first process in the array and marks that process for termination. But in this scenario that process is actually doing work and shouldn't be marked for termination, it should be the most recently added process.

That's why taking the last process in the array and terminate that one instead of the first one also fixed the issue: https://github.com/laravel/horizon/issues/1450#issuecomment-2130387275.

So yeah, this PR will work but is not fixing the cause.

driesvints commented 4 months ago

Thanks @nckrtl. @taylorotwell do you feel like we should first address the real underlying issue?

taylorotwell commented 4 months ago

@driesvints @nckrtl I'm open to any fix if someone feels there is a deeper issue. I don't have time to personally look into so PRs welcome or adjustments to this PR are fine too.

Mark as ready for review when you want me to take another look.

crynobone commented 4 months ago

This PR looks okay, but I also agree with @nckrtl on the underlying issue.

I personally think ProcessPool::runningProcess() should includes $terminatingProcesses if it's still processing job (isRunning()):

https://github.com/laravel/horizon/blob/c5799072c0613145eb15b243e0d49f3e42cdb4fb/src/ProcessPool.php#L289-L299

nckrtl commented 4 months ago

Yeah that makes sense. Would make it more solid/useful as well when explicitly checking for running processes elsewhere in the code. Instead of having to keep adding an extra check for the terminating processes as well, like this PR does.

Edit: took the liberty of creating a PR with the change @crynobone suggests: https://github.com/laravel/horizon/pull/1454

tarexme commented 4 months ago

Closing this after the fix has been merged in the PR above.