Publish job only if there is'nt an active one already

jede commented 7 years ago

Hi!

Thank you for PG Boss! We use it in production and it works really well! 🎉

A question though, as I understand it there is no way currently to prevent a job from beeing queued if there is one queued or active at the moment? Would it make sense to add this?

Our use case is that we want to have a "clock" process that queues jobs on a regular basis, stuff that needs to happen every minute for instance. But we wouldn't want to run multiple jobs at the same time, that would be really wasteful, especially if a job takes longer then the interval for some reason.

timgit commented 7 years ago

@jede Did you already try out the throttling config and this is not what you're looking for?

Oh, and thanks for 🎉! Glad to hear it's working out for you guys. 👍

jede commented 7 years ago

Hi Tim!

I looked at throttling, and made a brief test. From what I could see a new job will start running after the throttle threshold even if there is one running?

timgit commented 7 years ago

You're right. Throttling is not as strict as what you're looking for. I do a similar workflow in my app where I use node-schedule with a cron expression to wait until the proper time to kick off a job in pg-boss. However, my interval is more forgiving than yours it would seem. :) If my task takes longer than expected, i have a lot of wiggle room before the next interval fires.

I think I could add a feature like throttling that would basically block queueing a job if a certain key is detected to be in the queue or actively running, but I want to makes sure I understand how you think it should behave from the API's perspective. Until then, are you currently working around this by querying the job table yourself?

timgit commented 7 years ago

Also, I forgot to mention my use case for throttling, because I wanted to compare it to yours. I have a task that I want to make sure doesn't run too often (not more than once per minute), but I could have several requests for that job arrive at the same time. I use throttling to make sure another job is not queued if one has already been submitted during the specified timebox.

jede commented 7 years ago

Right now we don't really solve it in a good way :) We want to make sure that if a task takes longer then usual, we won't fire another and put even more load on the systems, so for us its better to let the first one finish instead of running them in parallel.

timgit commented 7 years ago

Do you think adding a status(jobId:uuid) func to the API would help you out? It might perhaps return all the job properties with or without the payload. For example, if you're keeping the job id that you submitted, you could use that id to fetch the status of your job before enqueuing another one.

jede commented 7 years ago

Possibly, but it wouldn't be ideal. I would prefer an option to say it shouldn't be queued if there is an active job already. But its maybe to complicated to solve with a constraint, and another query would add to much overhead? On Wed, 15 Feb 2017 at 16:41, Tim Jones notifications@github.com wrote:

Do you think adding a status(jobId:uuid) func to the API would help you out? It might perhaps returns all the job properties with or without the payload. For example, if you're keeping the job id that you submitted, you could use that id to fetch the status of your job before enqueueing another one.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/timgit/pg-boss/issues/10#issuecomment-280045762, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAc3cCw3IRR-Cf7cavBBUwiPy_fCafJks5rcxyygaJpZM4L4RxE .

timgit commented 7 years ago

My first thought was in regards to a constraint so pg can be a centralized enforcer of uniqueness/correctness, especially in distributed use cases where you have multiple workers and you can't use local state to guard against this.

The status values are

created
active
complete
expired
cancelled

However, since pg-boss manages the status values internally, using status as a unique constraint may produce internal constraint violations and then we have a more complicated problem to solve. Currently, any job with status created or expired coupled with a retryCount < retryLimit could become active on the next request from a worker. So instead of just being concerned with active, one would also have to be concerned with these other cases as well.

A temporary solution you could implement that would satisfy your requirements and persistence (if your process restarts or crashes, for example) would be to store this "something was submitted" state in a different table. Each time pg-boss calls your callback and you finish the work, you could reset this state along with calling done() to indicate the job completed in pg-boss.

I think there's a way I can pull this off with a unique constraint but I haven't figured it out yet lol. I don't want to have to lock the table obviously. For exampe, if you rely on fetching, you could have 2 processes receive the "nothing was active right now" response and then both would submit 2 jobs simulaneously. That's the really nice thing about the database managing the constraint to prevent that race condition.

timgit commented 7 years ago

Ok, I think I just found a solution for this via conditional unique indexes (learn something new every day). Give me a week to play around this technique. Hopefully will result in a 🎉

timgit commented 7 years ago

@jede I just pushed a beta release for this. If you don't mind, give it a try on a non-production instance and let me know your results. I've added a new option on publish called singletonKey that implements this behavior.

The following tests show it in action: https://github.com/timgit/pg-boss/blob/61705996e0a07f14b938247c5c19b6679a251b5d/test/singletonTest.js

jede commented 7 years ago

Wow! Thanks! I'll give it a try 😃👏

timgit / pg-boss

Publish job only if there is'nt an active one already #10