Closed 0xavi0 closed 8 months ago
@0xavi0, you are right. This effect is likely to occur under load and can be mitigated using retries. A straightforward solution would be to place the code responsible for reading and rescheduling the job under the scheduler's lock, and to add that lock to other relevant methods. Let's keep this issue open to consider alternative options.
Pull request #121 ensures the atomicity of the fetchAndReschedule
operation by acquiring the scheduler's mutex to lock the critical section. Please let me know if you are still able to reproduce the issue after applying the fix.
I'm not able to reproduce with https://github.com/reugn/go-quartz/pull/121. Thanks for the fix! Any plan when this is going to be released?
The plan is to release next week.
Thanks!
Description I found this when stress testing
quartz
. I saw this code takes a job being triggered out of the jobs queue in order to apply the next run time and reschedule.When running a job, if the scheduler is running that part of the code and, in a different routine, we try to get that job (for deleting it, modifying or whatever) we get a
job not found
error. The scheduler will push the job again to the queue with the new next run time, but the other routine could be creating a duplicate job because it could not find the current one.The queue is just pushing jobs, so my understanding is that job duplication is possible.
For example, this simple code might be duplicating jobs if
GetScheduledJob
is called right at that part of the code mentioned above and the job is out of the queue for being rescheduled.This is an example to recreate the issue: (at least I'm able to recreate in my system)
If you run that code you will see it retries calling
GetScheduledJob
, which should never happen because the job is scheduled and never deleted. The example just gets the job everyChangeJobCycle
milliseconds and changes theName
of the job.In order to see the retries without the rest of the noise you can do:
I'm getting a few lines:
I was able to recreate this in
master
and also in0.9.0
(I haven't checked other versions)Expected behaviour I would expect to get the job even if it's being executed (or rescheduled) at that moment.
Suggestion Apply next time to run without taking the job out of the queue. Maybe using a pointer for
priority
just like forjob
inscheduledJob
and just changing thepriority
value.