michaelklishin / quartz-mongodb

A MongoDB-based store for the Quartz scheduler. This fork strives to be as feature complete as possible. Originally by MuleSoft.
Other
248 stars 203 forks source link

WriteError from MongoDB : E11000 duplicate key error collection #114

Closed gocanal closed 7 years ago

gocanal commented 8 years ago

Hello,

I got this error when running my code:

c.n.quartz.mongodb.LockManager : Failed to lock trigger mygroup.trigger1, reason: WriteError{code=11000, message='E11000 duplicate key error collection: vr-scheduler-jobs.quartz_locks index: keyGroup_1_keyName_1_type_1 dup key: { : "mygroup", : "trigger1", : "t" }', details={ }}

Does this indicate some null key/value inserting to MonoDB, or something else ?

Jobs are triggered successfully.

Here is my JobConfiguration class:

@PostConstruct
private void initialize() throws Exception {
    schedulerFactoryBean.getScheduler().addJob(jobDetailSimpleJob(), true, true);
    if (!schedulerFactoryBean.getScheduler().checkExists(new TriggerKey("trigger1", "mygroup"))) {
        schedulerFactoryBean.getScheduler().scheduleJob(simpleTriggerSimpleJob());
    }
}

private static JobDetail jobDetailSimpleJob() {
    JobDetailImpl jobDetail = new JobDetailImpl();
    jobDetail.setKey(new JobKey("simplejob", "mygroup"));
    jobDetail.setJobClass(SimpleJob.class);
    // remain stored in the job store even if no triggers point to it anymore
    jobDetail.setDurability(true);
    return jobDetail;
}

private static Trigger simpleTriggerSimpleJob() {
    return newTrigger()
            .forJob(jobDetailSimpleJob())
            .withIdentity("trigger1", "mygroup")
            .withPriority(50)
            // Job is scheduled for 3+1 times with the interval of 30 seconds
            .withSchedule(simpleSchedule()
                    .withIntervalInSeconds(30)
                    .withRepeatCount(3))
            .startAt(DateTime.now().plusSeconds(3).toDate())
            .build();
}

thanks

pwojnowski commented 8 years ago

I'll try to recreate that, but now have just a few questions:

What does quartz_locks collection contain before you run the code? Is it empty?

The message means that quartz_locks collection already contains such lock. It could happen when: 1. Another code locked trigger with the same name. 2. The trigger's lock from previous run haven't been removed (in which case it is a bug).

Do you have a full stack trace?

emk888 commented 8 years ago

Try to not reuse job and trigger names. I am not sure. But I had the same problems. Now my jobs work fine. I think as u use the job / trigger name for index, mongoDB cannot reuse the same name even if it is delete. I just added a timestamp in my job / trigger name. And it works.

michaelklishin commented 7 years ago

Closing as no specific details we need to reproduce were provided.

sumitjainn commented 7 years ago

@michaelklishin I am facing same issue, here's my log:

2016-12-21 15:49:38,006 INFO c.n.quartz.mongodb.dao.LocksDao - Removing trigger lock DEFAULT.cronTrigger.NON_CLUSTERED 2016-12-21 15:49:38,008 INFO c.n.quartz.mongodb.dao.LocksDao - Trigger lock DEFAULT.cronTrigger.NON_CLUSTERED removed. 2016-12-21 15:49:38,009 INFO c.n.quartz.mongodb.dao.TriggerDao - Found 2 triggers which are eligible to be run. 2016-12-21 15:49:38,011 INFO c.n.quartz.mongodb.dao.LocksDao - Inserting lock for trigger DEFAULT.jobTrigger 2016-12-21 15:49:38,012 INFO c.n.quartz.mongodb.LockManager - Failed to lock trigger DEFAULT.jobTrigger, reason: WriteError{code=11000, messa$ e='E11000 duplicate key error index: quartz.mycol_locks.$keyGroup_1_keyName_1_type_1 dup key: { : "DEFAULT", : "jobTrigger", : "t" }', details$ { }} 2016-12-21 15:49:38,015 INFO c.n.quartz.mongodb.LockManager - Trigger DEFAULT.jobTrigger hasn't expired yet. Lock time: Wed Dec 21 15:49:37 IS T 2016 2016-12-21 15:49:38,018 INFO c.n.quartz.mongodb.dao.LocksDao - Inserting lock for trigger DEFAULT.cronTrigger 2016-12-21 15:49:38,019 INFO c.n.quartz.mongodb.TriggerRunner - Acquired trigger: DEFAULT.cronTrigger

I have only two triggers:

<bean id="cronTrigger" class="org.springframework.scheduling.quartz.CronTriggerFactoryBean">
    <property name="jobDetail" ref="simpleTask" />
    <property name="cronExpression" value="0/2 * * * * ?" />
</bean>
<bean id="jobTrigger" class="org.springframework.scheduling.quartz.SimpleTriggerFactoryBean">
    <property name="repeatInterval" value="1000" />
    <property name="repeatCount" value="3" />
    <property name="jobDetail" ref="simpleTask" />
</bean>

The error also occurs with clean database and single trigger as well.

I can provide more info if needed, plz reopen the issue.

michaelklishin commented 7 years ago

@sumitjainn we need a way to reproduce.

sumitjainn commented 7 years ago

Michael, what is the impact of this error? Is there any functional loss?

Also if you want, I can provide you the source code to reproduce.

michaelklishin commented 7 years ago

@sumitjainn please do (post a repo on GitHub that includes a small project that can be used to reproduce)

sumitjainn commented 7 years ago

@michaelklishin will do.. can you please comment on the impact?

michaelklishin commented 7 years ago

@sumitjainn either the library or your app tries to create a duplicate trigger and fails to acquire a lock. That's as much as I know.

pwojnowski commented 7 years ago

I looked at this issue and this is not a problem with reusing name for trigger locks, because it works as can be seen here:

db.quartz_locks.insertOne({type: "t", keyGroup: "kg", keyName: "kn", instanceId: "i1", time: new Date(1)})
{
    "acknowledged" : true,
    "insertedId" : ObjectId("585c016386e50ab87c310780")
}
// lock exists, so expect exception:
> db.quartz_locks.insertOne({type: "t", keyGroup: "kg", keyName: "kn", instanceId: "i1", time: new Date(1)})
2016-12-22T17:38:06.889+0100 E QUERY    [thread1] uncaught exception: WriteError({
    "index" : 0,
    "code" : 11000,
    "errmsg" : "E11000 duplicate key error index: quartz_mongodb_test.quartz_locks.$keyGroup_1_keyName_1 dup key: { : \"kg\", : \"kn\" }",
    "op" : {
        "_id" : ObjectId("585c016e86e50ab87c310781"),
        "type" : "t",
        "keyGroup" : "kg",
        "keyName" : "kn",
        "instanceId" : "i1",
        "time" : ISODate("1970-01-01T00:00:00.001Z")
    }
}) :
undefined
> db.quartz_locks.find()
{ "_id" : ObjectId("585c016386e50ab87c310780"), "type" : "t", "keyGroup" : "kg", "keyName" : "kn", "instanceId" : "i1", "time" : ISODate("1970-01-01T00:00:00.001Z") }
> db.quartz_locks.deleteOne({"_id" : ObjectId("585c016386e50ab87c310780")})
{ "acknowledged" : true, "deletedCount" : 1 }
> db.quartz_locks.find()
// no entries, so try to reuse the names:
> db.quartz_locks.insertOne({type: "t", keyGroup: "kg", keyName: "kn", instanceId: "i1", time: new Date(1)})
{
    "acknowledged" : true,
    "insertedId" : ObjectId("585c01a786e50ab87c310782")
}

I could change relocking mechanism to always relock a trigger in case of non-clustered scheduler, but I need to know why it tries to lock the same trigger again. One reason I can imagine is that trigger repeat interval is more tight than time to execute its job. Then Quartz tries to execute it again, but the trigger's lock already exists and the lock hasn't expired yet - at least this is a possible scenario looking at the stacktrace.

@sumitjainn how long does it take to execute your job? Do you have the same issue with a job that prints only "Hello world!"? Could you retry with increased repeatInterval for the trigger?

knoxxs commented 6 years ago

One other example scenario happened for me is: My mongo got down and the main thread broke because of following error: [2018-02-13 21:24:48.819] [scheduler_QuartzSchedulerThread] [ERROR] [o.quartz.core.QuartzSchedulerThread] [Runtime error occurred in main trigger firing loop.] [com.mongodb.MongoSocketReadException: Exception receiving message.

This results in some locks not getting cleaned (unlock). When the main thread runs again (as it is in while true block) it calls qsRsrcs.getJobStore().acquireNextTriggers which then calls lockManager.relockExpired for these locks and expiryCalculator.isTriggerLockExpired(existingLock) returns false even after triggerTimeoutMillis because the hasDefunctScheduler implementation inside isTriggerLockExpired doesn't take care of the case when the scheduler is non clustered.

It can be solved if we replace following

private boolean hasDefunctScheduler(String schedulerId) {
        Scheduler scheduler = schedulerDao.findInstance(schedulerId);
        if (scheduler == null) {
            log.debug("No such scheduler: {}", schedulerId);
            return false;
        }
        return scheduler.isDefunct(clock.millis()) && schedulerDao.isNotSelf(scheduler);
    }

to:

private boolean hasDefunctScheduler(String schedulerId) {
        //check if it is not clustered then return true;
        Scheduler scheduler = schedulerDao.findInstance(schedulerId);
        .
        .
        .
    }
parashivamurthykr commented 4 years ago

One other example scenario happened for me is: My mongo got down and the main thread broke because of following error: [2018-02-13 21:24:48.819] [scheduler_QuartzSchedulerThread] [ERROR] [o.quartz.core.QuartzSchedulerThread] [Runtime error occurred in main trigger firing loop.] [com.mongodb.MongoSocketReadException: Exception receiving message.

This results in some locks not getting cleaned (unlock). When the main thread runs again (as it is in while true block) it calls qsRsrcs.getJobStore().acquireNextTriggers which then calls lockManager.relockExpired for these locks and expiryCalculator.isTriggerLockExpired(existingLock) returns false even after triggerTimeoutMillis because the hasDefunctScheduler implementation inside isTriggerLockExpired doesn't take care of the case when the scheduler is non clustered.

It can be solved if we replace following

private boolean hasDefunctScheduler(String schedulerId) {
        Scheduler scheduler = schedulerDao.findInstance(schedulerId);
        if (scheduler == null) {
            log.debug("No such scheduler: {}", schedulerId);
            return false;
        }
        return scheduler.isDefunct(clock.millis()) && schedulerDao.isNotSelf(scheduler);
    }

to:

private boolean hasDefunctScheduler(String schedulerId) {
        //check if it is not clustered then return true;
        Scheduler scheduler = schedulerDao.findInstance(schedulerId);
        .
        .
        .
    }

@michaelklishin @knoxxs I have faced similar issue in on of my pord I have created a jira for the same . Please let me know if you have got a fix for the same . https://github.com/michaelklishin/quartz-mongodb/issues/192