timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
1.95k stars 153 forks source link

Scheduler runs every 60 seconds? #327

Closed demitrin closed 2 years ago

demitrin commented 2 years ago

I've been reading more into the scheduling code to get a sense of how pgboss does scheduling. From what I can tell, it seems like the scheduler has an internal queue _pgboss_cron that is monitored by a worker. Whenever that worker is executing a job from _pgboss_cron, it checks all schedules and if any should have run in the past 60 seconds, then it schedules them to run via another queue.

Unless I'm mistaken, a 60 second debounce/check mechanism and a 60 second look-back may miss an execution under the right conditions. Would love to hear if I'm misunderstanding the code or if what I described is the current implementation.

phips28 commented 2 years ago

I am experiencing the same, it's always checked after 60 seconds. Even if I set the clockMonitorIntervalSeconds: 5 Also checked the timekeeper.js file, and 60 is hardcoded everywhere.

How can I schedule something < 60 seconds? Like */10 * * * * * run every 10 seconds. I can't do this with the current implementation right?

demitrin commented 2 years ago

I'm glad someone else took a look and came to the same conclusion. think my plan is going to be to fork the package and run my own scheduling implementation. I think the plan is to change the pgboss scheduler to be similar to the graphile scheduler (https://github.com/graphile/worker). The graphile scheduler can detect missing schedule runs, something that pgboss unfortunately can't today. If the changes I make are easily slotted into the current implementation, I'll open a PR

phips28 commented 2 years ago

I am using the node-cron package to schedule singleton jobs. (Only one job must run based on our requirements) not sure if thats even possible with pgboss scheduler.

I used node-cron before for all our schedulers, but wanted to give the pgboss scheduler a try. But so far it lacks some features for our requirements.

timgit commented 2 years ago

Thanks for helping me jog my memory on the complexity that is cron scheduling. :grin:

After reviewing this again, I realized I could rename an internal function that would make the code easier to read. For now, I will post some code here to explain how this works and hopefully ease some concerns. This will likely find its way into a commit for future reference/questions.

async start () {
    // setting the archive config too low breaks the cron 60s debounce interval so don't even try
    if (this.config.archiveSeconds < 60) {
      return
    }

    // cache the clock skew from the db server
    await this.cacheClockSkew()

    // create workers for both cron schedule check and result execution with a default latency of 4s per check
    await this.manager.work(queues.CRON, { newJobCheckIntervalSeconds: 4 }, (job) => this.onCron(job))
    await this.manager.work(queues.SEND_IT, { newJobCheckIntervalSeconds: 4, teamSize: 50, teamConcurrency: 5 }, (job) => this.onSendIt(job))

    // uses sendDebounced() to enqueue a cron check
    await this.cronMonitorAsync()

    // create monitoring interval to make sure cron hasn't crashed
    this.cronMonitorInterval = setInterval(async () => await this.monitorCron(), this.cronMonitorIntervalMs)
    // create monitoring interval to measure and adjust for drift in clock skew
    this.skewMonitorInterval = setInterval(async () => await this.cacheClockSkew(), this.skewMonitorIntervalMs)

    this.stopped = false
  }

Here's the cron schedule check once the job is fetched

async onCron () {
    if (this.stopped) return

    //...

    // set last time cron was evaluated for downstream usage in cron monitoring
    await this.setCronTime()

    // uses sendDebounced() to enqueue a cron check
    await this.cronMonitorAsync()
  }

The latency of 60s is for monitoring, not the primary cron processing, which uses debouncing to immediately queue a cron check job after each job completion. The only way you would miss a cron job in this configuration would be to cause onCron() execution to exceed 60s, which would require a very large amount of records in the schedule table. It's not impossible for that happen, by the way, so I would say "these are not the concerns you're looking for" unless you are attempting to store > 1mil schedules. As always, check your cron job completion times if you attempt this since I'm just throwing out a large number here for illustration. If you actually needed that many different schedules, I would probably propose you to adjust your architecture to introduce shared tasks per cron (there's not quite 1mil distinct minutes per day) or perhaps separate instances via partitioning.

Hope this helps!

timgit commented 2 years ago

I included the above comment changes and renamed cronMonitorAsync() to checkSchedulesAsync() in the most recent 7.4.0 release.