Closed Craigology closed 11 years ago
Additionally, if it helps for #67, here are the call stacks on the other two deadlocked threads, which in turn is causing the Service Control Manager's request to Stop thread to block:
QuartzSchedulerThread:
[In a sleep, wait, or join]
> mscorlib.dll!System.Threading.Monitor.Wait(object obj) + 0x1e bytes
Quartz.dll!Quartz.Impl.AdoJobStore.SimpleSemaphore.ObtainLock(Quartz.Impl.AdoJobStore.Common.DbMetadata metadata, Quartz.Impl.AdoJobStore.ConnectionAndTransactionHolder conn, string lockName = "TRIGGER_ACCESS") Line 96 + 0x9 bytes C#
Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(string lockName = "TRIGGER_ACCESS", System.Func<Quartz.Impl.AdoJobStore.ConnectionAndTransactionHolder,object> txCallback) Line 3441 + 0x7b bytes C#
Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.TriggersFired(System.Collections.Generic.IList<Quartz.Spi.IOperableTrigger> triggers) Line 2550 + 0x3c bytes C#
Quartz.dll!Quartz.Core.QuartzSchedulerThread.Run() Line 369 C#
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool ignoreSyncCtx) + 0xdc bytes
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) + 0x3b bytes
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() + 0x4d bytes
[Native to Managed Transition]
Quartz_xxxx-NON_CLUSTERED:
[In a sleep, wait, or join]
> mscorlib.dll!System.Threading.Monitor.Wait(object obj) + 0x1e bytes
Quartz.dll!Quartz.Impl.AdoJobStore.SimpleSemaphore.ObtainLock(Quartz.Impl.AdoJobStore.Common.DbMetadata metadata, Quartz.Impl.AdoJobStore.ConnectionAndTransactionHolder conn, string lockName = "TRIGGER_ACCESS") Line 96 + 0x9 bytes C#
Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.DoRecoverMisfires() Line 2813 + 0x81 bytes C#
Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.MisfireHandler.Manage() Line 3623 + 0x10 bytes C#
Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.MisfireHandler.Run() Line 3646 + 0x8 bytes C#
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool ignoreSyncCtx) + 0xdc bytes
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) + 0x3b bytes
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() + 0x4d bytes
[Native to Managed Transition]
Indeed seems weird, could you try with the latest master build?
Can you still reproduce this with version compiled from master? There have been some fixes along the way since 2.0.1 regarding the problem of scheduler hanging when stopping.
Hi Marko,
I have rebuilt from Master and I can confirm that the scheduler now tolerates the exceptions occurring on the AdoJobStore thread and remains responsive at service shutdown. Thanks!
I'll post a comment in #67 about the discovery of the cause of the SQL Server error which was causing AdoJobStore to deadlock in the first case.
Good to hear, I'm closing this issue for now.
Possibly related to #67Quartz job recovery periodically fails in clustered environment , we are finding that the scheduler blocks indefinitely when requested to shutdown by the hosting service in
QuartzScheduler.Shutdown(bool)
in the following thread Join operation, despite waitForJobsToComplete being false and all worker threads previously reported as shutdown:// Scheduler thread may have be waiting for the fire time of an acquired // trigger and need time to release the trigger once halted, so make sure // the thread is dead before continuing to shutdown the job store. try {
schedThread.Join();
} catch (ThreadInterruptedException) { }
The Windows service control manager then reports the service as refusing to shutdown.
The error presumably leading to the deadlock situation occurs intermittently when a CRON trigger fires
I understand that #67 is currently being addressed, but irrespectively, is an indefinite Join the appropriate course of action here for what should be a non-waiting shutdown?