michaelklishin / quartz-mongodb

A MongoDB-based store for the Quartz scheduler. This fork strives to be as feature complete as possible. Originally by MuleSoft.
Other
249 stars 203 forks source link

Error on High frequency trigger - failed to lock #144

Closed matano-t2k closed 6 years ago

matano-t2k commented 7 years ago

I use the following trigger in a none clustered environment: Trigger trigger = newTrigger() .withIdentity("trigger1", "group1") .startNow() .withSchedule(simpleSchedule() .withIntervalInSeconds(30) //.withIntervalInMinutes(5) .repeatForever()) .build();

If the interval is less than a minute, i get the following error: Failed to lock trigger group1.trigger1, reason: WriteError{code=11000, message='E11000 duplicate key error collection: alerts.scheduler_locks index: keyGroup_1_keyName_1_type_1 dup key: { : "group1", : "trigger1", : "t" }', details={ }}

is this a known limitation ? I ran the same scenario with MySQL job store and it worked as expected.

properties: org.quartz.scheduler.instanceName = MyClusteredScheduler org.quartz.scheduler.instanceId = AUTO org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 25 org.quartz.threadPool.threadPriority = 5 org.quartz.jobStore.misfireThreshold = 60000 org.quartz.jobStore.class = com.novemberain.quartz.mongodb.MongoDBJobStore org.quartz.jobStore.mongoUri=mongodb://admin:plat4Admin@localhost:27017 org.quartz.jobStore.dbName=alerts org.quartz.jobStore.collectionPrefix=scheduler

org.quartz.jobStore.isClustered = true

org.quartz.jobStore.clusterCheckinInterval = 20000

michaelklishin commented 7 years ago

This potentially can be avoided but not an indication of an issue per se: two instances trying to acquire the same lock and only one can succeed by definition. The question is, what should job stores return in such case. This needs an investigation in Quartz core and other stores.

matano-t2k commented 7 years ago

Hi, thanks for the quick reply. The test was made with only one instance, it seems that the "lock" table index still contains the obsolete value, or maybe it wasn't removed when it should have been. This happens only for set the trigger interval value to ~ 30 seconds or less.

I get a "MongoWriteException" in LockManager->tryLock

maverickdu commented 7 years ago

@michaelklishin I have the same question,And I am sure there is only ONE instance is running,when I use scheduleBuilder like the following code: ScheduleBuilder scheduleBuilder=SimpleScheduleBuilder.simpleSchedule().withIntervalInSeconds(5).repeatForever();

the job will be triggered with no error at the 1st time,3rd time,4th time ... (with 5 seconds interval)

THE 'E11000' error will occurs exactly at the 2nd time

maverickdu commented 7 years ago

I checked the log,as following:

2017-03-05 15:32:24.903 [dsr_cluster_Worker-1] INFO  com.novemberain.quartz.mongodb.dao.LocksDao - Removing trigger lock simple1488699139892.myTrigger1488699139892.dusirong-MacbookPro1488699108533
2017-03-05 15:32:24.905 [dsr_cluster_QuartzSchedulerThread] INFO  com.novemberain.quartz.mongodb.LockManager - Failed to lock trigger simple1488699139892.myTrigger1488699139892, reason: WriteError{code=11000, message='E11000 duplicate key error collection: dusirong.dsr_locks index: keyGroup_1_keyName_1_type_1 dup key: { : "simple1488699139892", : "myTrigger1488699139892", : "t" }', details={ }}
2017-03-05 15:32:24.905 [dsr_cluster_QuartzSchedulerThread] WARN  com.novemberain.quartz.mongodb.LockManager - Error retrieving expired lock from the database. Maybe it was deleted
2017-03-05 15:32:24.906 [dsr_cluster_Worker-1] INFO  com.novemberain.quartz.mongodb.dao.LocksDao - Trigger lock simple1488699139892.myTrigger1488699139892.dusirong-MacbookPro1488699108533 removed.

Line1:try to remove(release) lock Line2~3:try to acquire a lock,but failed(BUT the locked has not been released yet) Line4: lock removed(released) successfully

michaelklishin commented 6 years ago

Not enough information to reproduce => cannot do much about this.