humanmade / Cavalcade

A better wp-cron. Horizontally scalable, works perfectly with multisite.
https://engineering.hmn.md/projects/cavalcade/
Other
512 stars 46 forks source link

feat: add admin UI to view Cavalcade jobs and logs #124

Closed npivcevic closed 11 months ago

tomjn commented 11 months ago

Hey, I see you closed this but I was curious why? Are you planning to put this in a dedicated plugin? Or was this an internal change accidentally released?

tomjn commented 11 months ago

I had a quick look through the code, if I were to do a review most of the feedback would be around internationalisation via esc_html__ and adding escaping functions on output, e.g. <td><?php echo $job->id; ?></td> becoming <td><?php echo esc_html( $job->id ); ?></td>

npivcevic commented 11 months ago

Hey Tom, thanks for the quick feedback! I forked the repo and accidentally opened a PR here, but I'd love to contribute if you are open to it! I also have some stuff for Cavalcade Runner that I could add.

We are in the development process of switching to Cavalcade and we are just adding quality of life stuff.

I have a question if you don't mind answering. My biggest issue with Cavalcade Runner is that if the main process dies while some workers are running, the job(s) remain in a locked state (status: 'running') without any recovery method, never to be executed again. I did a quick fix that works only if one instance is running, but a more robust solution would be needed to handle multiple instances running concurrently.

rmccue commented 11 months ago

I think this would probably be best in a standalone plugin to keep Cavalcade's core lightweight - for our purposes, we have a UI externally in our cloud management dashboard, so one in the admin is not something we'd need.

I have a question if you don't mind answering. My biggest issue with Cavalcade Runner is that if the main process dies while some workers are running, the job(s) remain in a locked state (status: 'running') without any recovery method, never to be executed again. I did a quick fix that works only if one instance is running, but a more robust solution would be needed to handle multiple instances running concurrently.

That can indeed happen, we call it "stalling". We recommend ensuring that the main process doesn't die generally with system-level termination protection, and monitoring it with other tools - e.g. we have a canary job which we monitor invocations of.

Cavalcade Runner itself installs signal handlers when it runs, and will gracefully shut down if told to do so: https://github.com/humanmade/Cavalcade-Runner/blob/0dfb42d505e9cd870a11366c49ee680d327c961a/inc/class-runner.php#L94-L97

We intentionally don't restart or adjust jobs as it may be unsafe to do so automatically, but that behaviour can be layered on top via stall detection if desired.

npivcevic commented 11 months ago

Hey Ryan, thank you for your reply!

Waiting for a worker to finish is sometimes not an option for us. We have some jobs that can take 10 minutes or more to finish, so waiting that long might be too much.

So what I did is added a check that cross references active workers with locked jobs, and marks jobs as failed if it's not actively being worked on. This works nicely if running a single instance of Cavalcade Runner, but all hell would break loose if we were to run multiple instances 😄

protected function check_locked_jobs_without_active_worker() {
        $lockedJobs = $this->get_locked_jobs();

        foreach ($lockedJobs as $job) {
            $isJobBeingWorkedOn = false;

            foreach ($this->workers as $worker) {
                if ($worker->job->id === $job->id) {
                    $isJobBeingWorkedOn = true;
                    break;
                }
            }

            if ($isJobBeingWorkedOn) {
                continue;
            }

            printf( '[  ] Job is locked without active worker: %s' . PHP_EOL, print_r($job, true));

            $logger = $this->hooks->run(
                'Runner.check_workers.logger',
                new Logger( $this->db, $this->table_prefix )
            );

            $job->mark_failed();
            $logger->log_job_failed( $job, 'Job remained locked without active worker.' );
            printf( '[  ] Job marked as failed: %s' . PHP_EOL, print_r($job, true));
        }
    }

    protected function get_locked_jobs() {
        $query = "SELECT * FROM {$this->table_prefix}cavalcade_jobs";
        $query .= ' WHERE status = "running"';

        $statement = $this->db->prepare( $query );
        $statement->execute();

        return $statement->fetchAll(PDO::FETCH_CLASS, __NAMESPACE__ . '\\Job', [ $this->db, $this->table_prefix ] );
    }

I also did some other improvements:

If you are interested in any of those features, would love to contribute and I'll open a PR. We can add them as options on runner initialisation and keep defaults as is.

Loving this project, especially now with the active support form you guys! Thank you so much!