Jobs in Kubernetes infrastructure get stuck in the reserved (pending) state when pods are killed.
This issue occurs periodically when deploying new changes or when Horizon pods are terminated.
It appears that jobs stuck in the pending state are retried exactly retry_after param value seconds (the value set in config/queue.phpretry_after) after the reserved_at epoch stored in Redis.
Additionally, if a job is stuck in the pending state, it seems that the ->then(callback) callback function is no longer being executed. Consequently, the batch is completed, but the callback responsible for finalizing everything is never triggered.
We urgently require a solution to address this issue. How can we ensure that if a pending job finally executes after 21 minutes, the ->then(callback) callback function is executed?
Steps To Reproduce
To reproduce this problem, follow these steps:
Create a batch with multiple jobs and assign a unique identifier to each job to indicate its position in the queue.
Start the Horizon service.
Terminate the Horizon service before the batch is completed processing.
Restart the Horizon service and wait until all batch jobs are executed.
You will observe a few jobs stuck in the pending state. These are the jobs that were being executed but never completed due to the Horizon shutdown.
Wait retry_after value seconds with the Horizon service active, and you will see that those pending jobs are executed.
You will also notice that the ->then(callback) callback method is not being executed.
Horizon Version
5.24.5
Laravel Version
10.48.12
PHP Version
8.3.13
Redis Driver
PhpRedis
Redis Version
7.2.6
Database Driver & Version
MySQL 8.4.3
Description
Jobs in Kubernetes infrastructure get stuck in the
reserved
(pending) state when pods are killed.This issue occurs periodically when deploying new changes or when Horizon pods are terminated.
It appears that jobs stuck in the pending state are retried exactly
retry_after
param value seconds (the value set inconfig/queue.php
retry_after
) after thereserved_at
epoch stored in Redis.Additionally, if a job is stuck in the pending state, it seems that the
->then(callback)
callback function is no longer being executed. Consequently, the batch is completed, but the callback responsible for finalizing everything is never triggered.We urgently require a solution to address this issue. How can we ensure that if a pending job finally executes after
21
minutes, the->then(callback)
callback function is executed?Steps To Reproduce
To reproduce this problem, follow these steps:
retry_after
value seconds with the Horizon service active, and you will see that those pending jobs are executed.->then(callback)
callback method is not being executed.