michaelklishin / quartz-mongodb

A MongoDB-based store for the Quartz scheduler. This fork strives to be as feature complete as possible. Originally by MuleSoft.
Other
249 stars 203 forks source link

Schedulers documents are not cleared #136

Open gmiejski opened 7 years ago

gmiejski commented 7 years ago

I've come on a bug [or not implemented feature?] where scheduler documents are not cleared from mongo, leading to new entries with each deployment - seems trivial but it's harder but can make some things properly (besides growing collection) - like monitoring of how many active schedulers there are.

I propose 2 solutions - either make other cluster instances scan schedulers collection and periodically clean old ones (or make it configurable) or simply change/add a field "lastCheckingDate" into the schedulers objects (with Date type) - then one can simply add a TTL index to mongo, and everything would be fine.

Please tell me if I hadn't noticed something important, or maybe creating a TTL index for old, inactive cluster instances would break some stuff I'm not aware of. (but placing TTL index for like 1 hour would probably not break such things - still not sure how about lock or that kind of stuff)

Please share your thoughts!

michaelklishin commented 7 years ago

A field with last scheduler activity sounds good to me. When would it be set, however?

gmiejski commented 7 years ago

I was thinking that this lastCheckingDate could be calculated from lastCheckinTime and simply stored applied in SchedulerDao.createUpdateClause().

However, there are two points I have just found out:

  1. When you recover state, you always recover state by the same instanceId - which won't work with AUTO generated instancedIds in CLUSTERED mode - we should also recover old clusterInstances, their lock and clear those too.
  2. There seems to be some kind of a bug, because I can see growing number of trigger locks, without corresponding triggers - have you come across such thing?

But I have studied a bit how it is implemented in original Quartz, and they clear old records during checkingIn - when they also recover old triggers.

Considering the clustered mode, the simplest solution seems to be not-the-best-option. I would go for recovering old cluster states during checking in, as it is done in Quartz JobStoreSupport, what do you think? How about clearing all locks acquired by non-active instances together with a document in _schedulers for not-checking-in instance?

otlg commented 3 years ago

Hi,

Seems the _schedulers collection is never cleaned up. Might be problematic in k8s for stateless services. Any plan to solve it?

Thanks

michaelklishin commented 3 years ago

Hi,

Seems the _schedulers collection is never cleaned up. Might be problematic in k8s for stateless services.

Any plan to solve it?

Thanks

This is open source software. You are welcome to contribute a fix.

otlg commented 3 years ago

If there is no plan and there is no other workaround, I can contribute a fix. Seems strange that the issue is opened for 6 years and still no fix for it.

gmiejski commented 3 years ago

As far as I remember, the issue seemed easy to fix, but during implementation I've came across more complicated details, which I've described above. I'm not using this cool library anymore, so I won't be able to help, but keeping my fingers crossed 🤞