Closed Gonzo17 closed 6 years ago
I don't see how this job store can shut down the entire JVM and there is no evidence of that in the log. There is an exception in a Quartz thread pool, that's it. Sorry but this is almost certainly a red herring.
I don't see how this job store can shut down the entire JVM
See Line 26 in CheckinTask :
System.exit(1);
On, that is fucked up :( @Gonzo17 I'm not sure how Quartz jobs are supposed to terminate cleanly but definitely feel free to submit a PR that removes the System.exit
call with something more reasonable (such as logging).
That said, the comment for that runnable suggests that this is not a rookie mistake. There is no obvious way to stop just Quartz.
Thanks for your quick reply @michaelklishin . I don't know very much about Quartz yet but my first idea was to pause or standby the scheduler (difference discussed here). However, I don't know how to access the scheduler there and when to resume scheduler.
This issue unfortunately is a hard dealbreaker for using this library in production. It's just not tenable to shut down the whole JVM over the tiniest transient network hiccup.
I wish I understood Quartz better (or at all) or I'd submit a fix.
Maybe rather than attempting to stop the Quartz scheduler (which it does not seem that the job store has direct access to), maybe we could set a flag on the job store on a failed checkin, that causes calls like acquireNextTriggers()
to return empty results until the next successful checkin? Would something like that be sufficient to address the issue? I'm happy to work on a PR for this if someone can give me guidance on whether this is a viable approach.
Or if we don't know the correct fix, alternatively maybe we could add a config setting to turn off this shutdown behavior. Perhaps combined with more configurability around the period of time that other cluster members have to wait to declare a scheduler "defunct".
@michaelklishin @pwojnowski thoughts?
Hey guys, are there any news on this topic? At the moment we need to restart our application after a connection loss to the database. We can live with that because our service does not work with critical data and this happens only once or twice a month. But as mentioned by @eonwhite this is kind of a dealbreaker to really use it in production for critical data.
Triggering (no pun intended) connection recovery is something I'd investigate.
@Gonzo17 this is open source software. If something is a dealbreaker for you, feel free to investigate a solution and submit a PR.
I introduced a way to opt out. Property pausing and unpausing Quartz is still TBD (and needs some research of JDBC stores to see what they do).
Hey guys,
we have some troubles with the CheckinTask because it shuts down the JVM on any failure within the task.
In our setup we use a MongoDB Cluster to schedule tasks. Due to network problems our microservice lost connection to the MongoDB Cluster and after a timeout of 30s the service was shutdown:
So my problem is that I want my microservice to attempt a reconnect to the database instead of shutting down. If I dont use the Cluster that works well. I understand that there is a need to prevent a job from being executed twice in a Cluster. But what about pausing all triggers instead of shutting down the JVM or the scheduler?
Here is my configuration: