Closed npivcevic closed 11 months ago
I had a quick look through the code, if I were to do a review most of the feedback would be around internationalisation via esc_html__
and adding escaping functions on output, e.g. <td><?php echo $job->id; ?></td>
becoming <td><?php echo esc_html( $job->id ); ?></td>
Hey Tom, thanks for the quick feedback! I forked the repo and accidentally opened a PR here, but I'd love to contribute if you are open to it! I also have some stuff for Cavalcade Runner that I could add.
We are in the development process of switching to Cavalcade and we are just adding quality of life stuff.
I have a question if you don't mind answering. My biggest issue with Cavalcade Runner is that if the main process dies while some workers are running, the job(s) remain in a locked state (status: 'running') without any recovery method, never to be executed again. I did a quick fix that works only if one instance is running, but a more robust solution would be needed to handle multiple instances running concurrently.
I think this would probably be best in a standalone plugin to keep Cavalcade's core lightweight - for our purposes, we have a UI externally in our cloud management dashboard, so one in the admin is not something we'd need.
I have a question if you don't mind answering. My biggest issue with Cavalcade Runner is that if the main process dies while some workers are running, the job(s) remain in a locked state (status: 'running') without any recovery method, never to be executed again. I did a quick fix that works only if one instance is running, but a more robust solution would be needed to handle multiple instances running concurrently.
That can indeed happen, we call it "stalling". We recommend ensuring that the main process doesn't die generally with system-level termination protection, and monitoring it with other tools - e.g. we have a canary job which we monitor invocations of.
Cavalcade Runner itself installs signal handlers when it runs, and will gracefully shut down if told to do so: https://github.com/humanmade/Cavalcade-Runner/blob/0dfb42d505e9cd870a11366c49ee680d327c961a/inc/class-runner.php#L94-L97
We intentionally don't restart or adjust jobs as it may be unsafe to do so automatically, but that behaviour can be layered on top via stall detection if desired.
Hey Ryan, thank you for your reply!
Waiting for a worker to finish is sometimes not an option for us. We have some jobs that can take 10 minutes or more to finish, so waiting that long might be too much.
So what I did is added a check that cross references active workers with locked jobs, and marks jobs as failed if it's not actively being worked on. This works nicely if running a single instance of Cavalcade Runner, but all hell would break loose if we were to run multiple instances 😄
protected function check_locked_jobs_without_active_worker() {
$lockedJobs = $this->get_locked_jobs();
foreach ($lockedJobs as $job) {
$isJobBeingWorkedOn = false;
foreach ($this->workers as $worker) {
if ($worker->job->id === $job->id) {
$isJobBeingWorkedOn = true;
break;
}
}
if ($isJobBeingWorkedOn) {
continue;
}
printf( '[ ] Job is locked without active worker: %s' . PHP_EOL, print_r($job, true));
$logger = $this->hooks->run(
'Runner.check_workers.logger',
new Logger( $this->db, $this->table_prefix )
);
$job->mark_failed();
$logger->log_job_failed( $job, 'Job remained locked without active worker.' );
printf( '[ ] Job marked as failed: %s' . PHP_EOL, print_r($job, true));
}
}
protected function get_locked_jobs() {
$query = "SELECT * FROM {$this->table_prefix}cavalcade_jobs";
$query .= ' WHERE status = "running"';
$statement = $this->db->prepare( $query );
$statement->execute();
return $statement->fetchAll(PDO::FETCH_CLASS, __NAMESPACE__ . '\\Job', [ $this->db, $this->table_prefix ] );
}
I also did some other improvements:
If you are interested in any of those features, would love to contribute and I'll open a PR. We can add them as options on runner initialisation and keep defaults as is.
Loving this project, especially now with the active support form you guys! Thank you so much!
Hey, I see you closed this but I was curious why? Are you planning to put this in a dedicated plugin? Or was this an internal change accidentally released?