grantcarthew / node-rethinkdb-job-queue

A persistent job or task queue backed by RethinkDB.
https://github.com/grantcarthew/node-rethinkdb-job-queue/wiki
MIT License
156 stars 16 forks source link

Why does timeout have implication on delay of failed jobs? #19

Closed TomKaltz closed 7 years ago

TomKaltz commented 7 years ago

The docs say a failed job dateEnable will be set to...

now() + job.timeout + ( job.retryDelay * job.retryCount )

Failed jobs are technically completed(albeit in error) and the next retry date (dateEnable) should NOT take into account the timeout. Unless I am totally missing something, timeout should have no bearing on a non-active job.

Also, wouldn't it be better to set the future retry dateEnable in job-failed.js while the failed job status is being set?

grantcarthew commented 7 years ago

Hi Tom, two things here.

The dateEnable is set when the job is retrieved from the database to allow the Queue Master review process to detect a failed node process.

If the dateEnable was set in job-failed, it would never be set if the node process hung.

Because of the above reasons, the dateEnable date needs to account for the running time for the job. Hence job.timeout.

TomKaltz commented 7 years ago

Yes but the job is done processing when it's failed. The problem is that retry is delayed too long. Timeout is only relevant for active jobs. A failed job is not active.

On Wednesday, October 5, 2016, Grant Carthew notifications@github.com wrote:

Hi Tom, two things here.

The dateEnable is set when the job is retrieved from the database to allow the Queue Master review process to detect a failed node process.

If the dateEnable was set in job-failed, it would never be set if the node process hung.

Because of the above reasons, the dateEnable date needs to account for the running time for the job. Hence job.timeout.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grantcarthew/node-rethinkdb-job-queue/issues/19#issuecomment-251599065, or mute the thread https://github.com/notifications/unsubscribe-auth/AC0_Ic4Rw-ChhQ6Bq33YDWLW8fZsR2-jks5qw00_gaJpZM4KOcCZ .

Thomas Kaltz III 586-214-7150 Sent from Gmail Mobile

grantcarthew commented 7 years ago

Ah, I see what you are getting at. Currently dateEnable is only set in the get-next-job module. You are saying is should be set if job-failed is called and to not add the timeout value?

That can be done.

TomKaltz commented 7 years ago

That is correct. When you set a retryDelay you expect the failed job to retry after that time has elapsed (and after queue review). Timeout should only be used for an active job.

On Wed, Oct 5, 2016 at 6:40 PM, Grant Carthew notifications@github.com wrote:

Ah, I see what you are getting at. Currently dateEnable is only set in the get-next-job module. You are saying is should be set if job-failed is called and to not add the timeout value?

That can be done.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grantcarthew/node-rethinkdb-job-queue/issues/19#issuecomment-251820417, or mute the thread https://github.com/notifications/unsubscribe-auth/AC0_IU5UrFOK3KaoFMzJKtNML5K0p7zNks5qxCdRgaJpZM4KOcCZ .

grantcarthew commented 7 years ago

OK, you want to attempt it @TomKaltz ?

TomKaltz commented 7 years ago

I'll see if I have time tonight but it might be a couple days :)

On Wed, Oct 5, 2016 at 6:48 PM, Grant Carthew notifications@github.com wrote:

OK, you want to attempt it @TomKaltz https://github.com/TomKaltz ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grantcarthew/node-rethinkdb-job-queue/issues/19#issuecomment-251822006, or mute the thread https://github.com/notifications/unsubscribe-auth/AC0_IYUpDRwM3QmBm7KkoZIjoHcw_arvks5qxClSgaJpZM4KOcCZ .

grantcarthew commented 7 years ago

Hey @TomKaltz , I am at a loose end so am working on this now. It's only a one line change.

grantcarthew commented 7 years ago

Done, fixed in v0.4.4