timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
1.95k stars 153 forks source link

Questions on Scheduler #287

Closed on2air closed 2 years ago

on2air commented 2 years ago

After some digging into the code, I have a few questions on how the schedule feature works for recurring tasks.

It appears that you schedule a queue to repeat, not specific jobs, correct? This doesn't seem to fit my use case. I have a single queue with different jobs that in the past have utilized the singletonkey to differentiate themselves.

I guess I could treat the queues as different jobs by using a unique name for each and then use a dynamic listener to listen to all queues based on a wildcard in the name. Is that best practice?

If I do that approach, it appears that you are reading in all rows in the schedule table every interval (i.e. every ~30 seconds) to determine which ones need to run. Seems like that is a lot of database queries to constantly be reading in that list of records so frequently. Does it scale to large number of scheduled items?

My thinking was that I could schedule jobs with unique payloads to run on a scheduled cron interval but it doesn't appear that is how things work. Am I missing something?

nicoburns commented 2 years ago

My thinking was that I could schedule jobs with unique payloads to run on a scheduled cron interval but it doesn't appear that is how things work. Am I missing something?

The way I have approached this is to have two queues:

If I do that approach, it appears that you are reading in all rows in the schedule table every interval (i.e. every ~30 seconds) to determine which ones need to run. Seems like that is a lot of database queries to constantly be reading in that list of records so frequently. Does it scale to large number of scheduled items?

The table's indexed, so it doesn't have to scan every row. For really high queue volumes you probably would start running into issues, but we're talking several thousand queue tasks at least before you'd run into trouble.

timgit commented 2 years ago

I will add that this feature was not designed with a massive amount of unique schedules in mind. I've never tested this at a very high number of unique schedules. The comment above is a good mitigation to that, for example, if you could group queues together that share a common cron interval.

on2air commented 2 years ago

thanks @timgit @nicoburns for your input. That helps me to architect going forward.