If the RunCommand crashes after locking a job but before starting it the job is stuck being locked and won't be picked up again, even by the same worker. While unlikely this is not impossible. Here are some stats from a production system of ours:
> SELECT state, count(*) FROM jms_jobs GROUP BY state ORDER BY state;
state │ count
────────────┼─────────
canceled │ 6
failed │ 3323
finished │ 1792415
incomplete │ 7
pending │ 8
running │ 18
All of the 8 pending jobs have been locked by a worker which crashed (or maybe was forcefully terminated) before starting the job. The jobs have been stuck in the pending state for over a month.
Unfortunately RunCommand::cleanUpStaleJobs() doesn't unlock jobs of the same worker. It probably should do so, right?
Some version information from the affected system:
If the
RunCommand
crashes after locking a job but before starting it the job is stuck being locked and won't be picked up again, even by the same worker. While unlikely this is not impossible. Here are some stats from a production system of ours:All of the 8 pending jobs have been locked by a worker which crashed (or maybe was forcefully terminated) before starting the job. The jobs have been stuck in the pending state for over a month.
Unfortunately
RunCommand::cleanUpStaleJobs()
doesn't unlock jobs of the same worker. It probably should do so, right?Some version information from the affected system:
CentOS 7.9.2009 Linux 3.10.0-1160.11.1.el7.x86_64 PHP 7.3.26 (with pcntl) symfony/symfony 3.4.47 jms/job-queue-bundle 2.1.0