timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.13k stars 158 forks source link

Option to schedule a job after a specific timestamp #58

Closed bradleyayers closed 6 years ago

bradleyayers commented 6 years ago

I think it would be useful to me to have an API that allows a job to be scheduled at a specific time. At the moment I'm achieving this via startIn, but it relies on the clocks of the worker and the database being close.

I'd like to be able to pass a Date to pg-boss, and have it convert it to UTC and send that to Postgres.

Slight aside: why is job.singletonOn a timestamp instead of timestamptz?

timgit commented 6 years ago

startIn uses the database's clock only. singletonOn is just used as a unqiue constraint, so I didn't feel very strongly in migrating it to with time zone

bradleyayers commented 6 years ago

Just to elaborate on the challenge I'm having with startIn, I want to schedule a job for a specific time (think cron rather than setTimeout). I want this because I want to generate time series data, by creating record end of each UTC day (e.g. 2018-02-16T00:00:00Z), for the previous day. To achieve this with startIn my application code needs to know the timezone of the database, which is hard, especially when trying to write portable code.

This is because to have a job start a specific timestamp, the application needs to calculate the number of seconds between the current db time, and the desired start date. To do this the application needs to understand the timezone of the server, in order to properly account for timezone boundaries where time intervals (e.g. 1 hour) are inserted or removed.

The application would need to not only know the current db timezone, but also the details of the timezone boundaries for as far ahead into the future as you want to schedule jobs.

Having a pg-boss API that allows a job to be scheduled for a specific timestamp would be very useful.

timgit commented 6 years ago

I have a similar use cases where I want jobs to run based on a cron expression. startIn currently operates in a "at least this offset" configuration, so if for some reason we miss the exact interval specified (no instances were online, for example), it will simply grab it at any future time when requested with no expectation of an exact moment. However, for cron expressions, there's an implicit contract I think we expect which is more strict than this.

To keep the cron problem more simple, I've designed a specific service in a known timezone to be responsible for time-sensitive operations. I use node-schedule with a cron expression and then publish a job in the callback . Now, I'm not trying to dismiss your request at all, and I do think it would be a cool feature to build, but a lot more effort seems to be required to make sure this is done in a way that would satisfy all these use cases in a distrubuted instance deployment.

bradleyayers commented 6 years ago

It's very interesting to hear about your experience, thanks for sharing that!

it will simply grab it at any future time when requested with no expectation of an exact moment

I don't quite understand this phrase. What simply grabs what at any future time? Do you mean a pg-boss worker grabs a job at any future time? In that case it sounds like the job was already submitted.

After reading your message (and I'm not sure if this was the point you were making or not) I realise that cron does have a more strict contract than what I'm proposing. In general I don't think it's really possible for a worker queue to reliably implement the cron contract, as there's no guarantee that there's a free worker available at a specific point in time to service a job.

However I realise my requirements are less strict than cron, and I'd simply be happy with jobs being able to have start after timestamp X semantics.

My scenario is simpler than arbitrary cron expressions, so I'm planning to have a "scheduler" job that recurs a few times a day and schedules the time-series-data-point-creation job for tomorrow. I'm also using node-schedule (I took your suggestion in another thread). The goal of the more frequent scheduler task is to guard against the service being down (e.g. during an upgrade) when node-schedule fires.

timgit commented 6 years ago

I don't quite understand this phrase. What simply grabs what at any future time? Do you mean a pg-boss worker grabs a job at any future time? In that case it sounds like the job was already submitted.

Yes, I mean

boss.publish('delayed', null, {startIn: '5 minutes'})

creates a 'delayed' job to run "at least 5 minutes from now", not "exactly in 5 minutes". If all your instances crashed, for example. Whenever this job is requested (subscribers or fetch), even if it's hours later, would still receive this job.

bradleyayers commented 6 years ago

Gotcha, yes in that case my request is basically something like this:

boss.publish('delayed', null, {startAfter: "2018-02-16T00:00:00Z"})

It'd probably be nice to support a native Date object too:

boss.publish('delayed', null, {startAfter: new Date(2018, 1, 16)})

Internally I imagine the Date would just be converted to an ISO string before being sent to Postgres.

timgit commented 6 years ago

I like this, and I'm also thinking about how we could merge this behavior into the current startIn interval, since the latter could simply be converted to a startAfter timstamptz using pg's interval

bradleyayers commented 6 years ago

I like this, and I'm also thinking about how we could merge this behavior into the current startIn interval, since the latter could simply be converted to a startAfter timstamptz using pg's interval

That would be a nice consolidation, I imagine it'd be just:

const startAfter = new Date();
startAfter.setTime(Date.now() + (startIn * 1000));
startAfter.toISOString();