Schedulers documents are not cleared

gmiejski commented 7 years ago

I've come on a bug [or not implemented feature?] where scheduler documents are not cleared from mongo, leading to new entries with each deployment - seems trivial but it's harder but can make some things properly (besides growing collection) - like monitoring of how many active schedulers there are.

I propose 2 solutions - either make other cluster instances scan schedulers collection and periodically clean old ones (or make it configurable) or simply change/add a field "lastCheckingDate" into the schedulers objects (with Date type) - then one can simply add a TTL index to mongo, and everything would be fine.

Please tell me if I hadn't noticed something important, or maybe creating a TTL index for old, inactive cluster instances would break some stuff I'm not aware of. (but placing TTL index for like 1 hour would probably not break such things - still not sure how about lock or that kind of stuff)

Please share your thoughts!

michaelklishin commented 7 years ago

A field with last scheduler activity sounds good to me. When would it be set, however?

gmiejski commented 7 years ago

I was thinking that this lastCheckingDate could be calculated from lastCheckinTime and simply stored applied in SchedulerDao.createUpdateClause().

However, there are two points I have just found out:

When you recover state, you always recover state by the same instanceId - which won't work with AUTO generated instancedIds in CLUSTERED mode - we should also recover old clusterInstances, their lock and clear those too.
There seems to be some kind of a bug, because I can see growing number of trigger locks, without corresponding triggers - have you come across such thing?

But I have studied a bit how it is implemented in original Quartz, and they clear old records during checkingIn - when they also recover old triggers.

Considering the clustered mode, the simplest solution seems to be not-the-best-option. I would go for recovering old cluster states during checking in, as it is done in Quartz JobStoreSupport, what do you think? How about clearing all locks acquired by non-active instances together with a document in _schedulers for not-checking-in instance?

otlg commented 3 years ago

Hi,

Seems the _schedulers collection is never cleaned up. Might be problematic in k8s for stateless services. Any plan to solve it?

Thanks

michaelklishin commented 3 years ago

Hi,

Seems the _schedulers collection is never cleaned up. Might be problematic in k8s for stateless services.

Any plan to solve it?

Thanks

This is open source software. You are welcome to contribute a fix.

otlg commented 3 years ago

If there is no plan and there is no other workaround, I can contribute a fix. Seems strange that the issue is opened for 6 years and still no fix for it.

gmiejski commented 3 years ago

As far as I remember, the issue seemed easy to fix, but during implementation I've came across more complicated details, which I've described above. I'm not using this cool library anymore, so I won't be able to help, but keeping my fingers crossed 🤞

michaelklishin / quartz-mongodb

Schedulers documents are not cleared #136