quartz-scheduler / quartz

Code for Quartz Scheduler
http://www.quartz-scheduler.org
Apache License 2.0
6.32k stars 1.94k forks source link

scheduleJob with replace = true causes data corruption when new trigger has a different type than existing #920

Open maxqch opened 1 year ago

maxqch commented 1 year ago

Using qrtz version 2.3.2, with postgres.

This occurs when a job & trigger exists, and a call is made to scheduler.scheduleJob(job, Set.of(newTrigger), true) where newTrigger has a different type (e.g. cron vs simple) compared to the existing trigger.

From what I can tell, the db query becomes an update instead of deleting from the previous table qrtz_simple_triggers and inserting into the new table qrtz_cron_triggers (in JobStoreSupport line 1222-1226). However, the trigger_type of qrtz_triggers is still updated from SIMPLE to CRON. Any subsequent reads will cause an error because no corresponding value exists in the CRON table.

Possible solutions:

Repro details

Configs

org.quartz.scheduler.instanceName = test-quartz
org.quartz.jobStore.isClustered = true
org.quartz.scheduler.instanceId = AUTO
org.quartz.threadPool.threadCount = 1
org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
...+ your db settings...

Sample test code

    @Test
    public void existingTriggerDifferentType() throws SchedulerException {
        JobDetail job = newJob(HelloJob.class)
            .withIdentity("job1", "group1")
            .build();

        Trigger trigger = newTrigger()
            .withIdentity("trigger1", "group1")
            .startAt(Date.from(Instant.now().plusSeconds(100000)))
            .withSchedule(simpleSchedule()
                .withIntervalInSeconds(400)
                .repeatForever())
            .build();

        scheduler.scheduleJob(job, trigger);

        Trigger trigger2 = newTrigger()
            .withIdentity(trigger.getKey())
            .startNow()
            .withSchedule(CronScheduleBuilder.dailyAtHourAndMinute(10, 15))
            .build();

        scheduler.scheduleJob(job, Set.of(trigger2), true);
        scheduler.getTrigger(trigger.getKey());
    }

    public static class HelloJob implements Job {
        @Override
        public void execute(final JobExecutionContext context) throws JobExecutionException {
        }
    }

Exception

org.quartz.JobPersistenceException: Couldn't retrieve trigger: No record found for selection of Trigger with key: 'group1.trigger1' and statement: SELECT * FROM QRTZ_CRON_TRIGGERS WHERE SCHED_NAME = 'test-quartz__6e549c21-6dba-4442-ae4a-fb32a442da8e' AND TRIGGER_NAME = ? AND TRIGGER_GROUP = ?
 [See nested exception: java.lang.IllegalStateException: No record found for selection of Trigger with key: 'group1.trigger1' and statement: SELECT * FROM QRTZ_CRON_TRIGGERS WHERE SCHED_NAME = 'test-quartz__6e549c21-6dba-4442-ae4a-fb32a442da8e' AND TRIGGER_NAME = ? AND TRIGGER_GROUP = ?]
    at org.quartz.impl.jdbcjobstore.JobStoreSupport.retrieveTrigger(JobStoreSupport.java:1538)
    at org.quartz.impl.jdbcjobstore.JobStoreSupport$12.execute(JobStoreSupport.java:1527)
    at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3864)
    at org.quartz.impl.jdbcjobstore.JobStoreTX.executeInLock(JobStoreTX.java:93)
    at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeWithoutLock(JobStoreSupport.java:3800)
    at org.quartz.impl.jdbcjobstore.JobStoreSupport.retrieveTrigger(JobStoreSupport.java:1524)
    at org.quartz.core.QuartzScheduler.getTrigger(QuartzScheduler.java:1505)
    at org.quartz.impl.StdScheduler.getTrigger(StdScheduler.java:508)
...
Caused by: java.lang.IllegalStateException: No record found for selection of Trigger with key: 'group1.trigger1' and statement: SELECT * FROM QRTZ_CRON_TRIGGERS WHERE SCHED_NAME = 'test-quartz__6e549c21-6dba-4442-ae4a-fb32a442da8e' AND TRIGGER_NAME = ? AND TRIGGER_GROUP = ?
    at org.quartz.impl.jdbcjobstore.CronTriggerPersistenceDelegate.loadExtendedTriggerProperties(CronTriggerPersistenceDelegate.java:107)
    at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.selectTrigger(StdJDBCDelegate.java:1819)
    at org.quartz.impl.jdbcjobstore.JobStoreSupport.retrieveTrigger(JobStoreSupport.java:1536)
    ... 40 more

DB before update

pge_ywebyilapeco=# select * from qrtz_triggers;
                    sched_name                     | trigger_name | trigger_group | job_name | job_group | description | next_fire_time | prev_fire_time | priority | trigger_state | trigger_type |  start_time   | end_time | calendar_name | misfire_instr | job_data
---------------------------------------------------+--------------+---------------+----------+-----------+-------------+----------------+----------------+----------+---------------+--------------+---------------+----------+---------------+---------------+----------
 test-quartz__8636ff67-628b-4701-9e7e-64c59bc23a13 | trigger1     | group1        | job1     | group1    |             |  1685151581209 |             -1 |        5 | WAITING       | SIMPLE       | 1685151581209 |        0 |               |             0 | \x
(1 row)

pge_ywebyilapeco=# select * from qrtz_simple_triggers;
                    sched_name                     | trigger_name | trigger_group | repeat_count | repeat_interval | times_triggered
---------------------------------------------------+--------------+---------------+--------------+-----------------+-----------------
 test-quartz__8636ff67-628b-4701-9e7e-64c59bc23a13 | trigger1     | group1        |           -1 |          400000 |               0
(1 row)

pge_ywebyilapeco=# select * from qrtz_cron_triggers;
 sched_name | trigger_name | trigger_group | cron_expression | time_zone_id
------------+--------------+---------------+-----------------+--------------
(0 rows)

DB after update

pge_ywebyilapeco=# select * from qrtz_triggers;
                    sched_name                     | trigger_name | trigger_group | job_name | job_group | description | next_fire_time | prev_fire_time | priority | trigger_state | trigger_type |  start_time   | end_time | calendar_name | misfire_instr | job_data
---------------------------------------------------+--------------+---------------+----------+-----------+-------------+----------------+----------------+----------+---------------+--------------+---------------+----------+---------------+---------------+----------
 test-quartz__8636ff67-628b-4701-9e7e-64c59bc23a13 | trigger1     | group1        | job1     | group1    |             |  1685096100000 |             -1 |        5 | WAITING       | CRON         | 1685051583000 |        0 |               |             0 | \x
(1 row)

pge_ywebyilapeco=# select * from qrtz_cron_triggers;
 sched_name | trigger_name | trigger_group | cron_expression | time_zone_id
------------+--------------+---------------+-----------------+--------------
(0 rows)

pge_ywebyilapeco=# select * from qrtz_simple_triggers;
                    sched_name                     | trigger_name | trigger_group | repeat_count | repeat_interval | times_triggered
---------------------------------------------------+--------------+---------------+--------------+-----------------+-----------------
 test-quartz__8636ff67-628b-4701-9e7e-64c59bc23a13 | trigger1     | group1        |           -1 |          400000 |               0
(1 row)
Halcyon666 commented 1 year ago

I met the situation reverse from maxqch, created cron job, job failed and generated a simple job. whether the wrong usage of quartz or bugs?

stale[bot] commented 1 year ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stevenschlansker commented 1 year ago

Unless anyone has fixed this, it is still relevant. We have a work-around in our application to keep us running for now, but it is a very bad user experience to have reasonable client calls result in weird internal errors. Just because nobody worked on it yet doesn't mean anything is fixed...

kyler888 commented 1 year ago

We have encountered the same corruption issue at our work place. It all started with a simple update to a cron schedule time that has gone and deleted the trigger. The action was to extend the time by an hour on the existing trigger. Is anyone actively looking at this. I guess the difference for our situation is we are not going from a simple to a cron trigger. We are just updating the time on an existing cron job/trigger