quartznet / quartznet

Quartz Enterprise Scheduler .NET
http://www.quartz-scheduler.net/
Apache License 2.0
6.51k stars 1.69k forks source link

Scheduled Shutdown blocked even if waitForJobsToComplete is false #70

Closed Craigology closed 11 years ago

Craigology commented 11 years ago

Possibly related to #67Quartz job recovery periodically fails in clustered environment , we are finding that the scheduler blocks indefinitely when requested to shutdown by the hosting service in QuartzScheduler.Shutdown(bool) in the following thread Join operation, despite waitForJobsToComplete being false and all worker threads previously reported as shutdown:

// Scheduler thread may have be waiting for the fire time of an acquired // trigger and need time to release the trigger once halted, so make sure // the thread is dead before continuing to shutdown the job store. try { schedThread.Join();

} catch (ThreadInterruptedException) { }

The Windows service control manager then reports the service as refusing to shutdown.

The error presumably leading to the deadlock situation occurs intermittently when a CRON trigger fires

2012-10-25 07:00:00.277 Error while executing the Runnable:QuartzServer_Worker-8 |  | System.ArgumentNullException System.ArgumentNullException: Connnection-transaction pair cannot be null
Parameter name: cth
   at Quartz.Impl.AdoJobStore.JobStoreSupport.RollbackConnection(ConnectionAndTransactionHolder cth) in c:\Work\OpenSource\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 3300
   at Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(String lockName, Func`2 txCallback) in c:\Work\OpenSource\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 3463
   at Quartz.Core.JobRunShell.Run() in c:\Work\OpenSource\quartznet\src\Quartz\Core\JobRunShell.cs:line 281

I understand that #67 is currently being addressed, but irrespectively, is an indefinite Join the appropriate course of action here for what should be a non-waiting shutdown?

Craigology commented 11 years ago

Additionally, if it helps for #67, here are the call stacks on the other two deadlocked threads, which in turn is causing the Service Control Manager's request to Stop thread to block:

QuartzSchedulerThread:

    [In a sleep, wait, or join] 
>   mscorlib.dll!System.Threading.Monitor.Wait(object obj) + 0x1e bytes 
    Quartz.dll!Quartz.Impl.AdoJobStore.SimpleSemaphore.ObtainLock(Quartz.Impl.AdoJobStore.Common.DbMetadata metadata, Quartz.Impl.AdoJobStore.ConnectionAndTransactionHolder conn, string lockName = "TRIGGER_ACCESS") Line 96 + 0x9 bytes  C#
    Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(string lockName = "TRIGGER_ACCESS", System.Func<Quartz.Impl.AdoJobStore.ConnectionAndTransactionHolder,object> txCallback) Line 3441 + 0x7b bytes  C#
    Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.TriggersFired(System.Collections.Generic.IList<Quartz.Spi.IOperableTrigger> triggers) Line 2550 + 0x3c bytes C#
    Quartz.dll!Quartz.Core.QuartzSchedulerThread.Run() Line 369 C#
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool ignoreSyncCtx) + 0xdc bytes    
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) + 0x3b bytes    
    mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() + 0x4d bytes   
    [Native to Managed Transition]  

Quartz_xxxx-NON_CLUSTERED:

    [In a sleep, wait, or join] 
>   mscorlib.dll!System.Threading.Monitor.Wait(object obj) + 0x1e bytes 
    Quartz.dll!Quartz.Impl.AdoJobStore.SimpleSemaphore.ObtainLock(Quartz.Impl.AdoJobStore.Common.DbMetadata metadata, Quartz.Impl.AdoJobStore.ConnectionAndTransactionHolder conn, string lockName = "TRIGGER_ACCESS") Line 96 + 0x9 bytes  C#
    Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.DoRecoverMisfires() Line 2813 + 0x81 bytes   C#
    Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.MisfireHandler.Manage() Line 3623 + 0x10 bytes   C#
    Quartz.dll!Quartz.Impl.AdoJobStore.JobStoreSupport.MisfireHandler.Run() Line 3646 + 0x8 bytes   C#
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool ignoreSyncCtx) + 0xdc bytes    
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) + 0x3b bytes    
    mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() + 0x4d bytes   
    [Native to Managed Transition]  
lahma commented 11 years ago

Indeed seems weird, could you try with the latest master build?

lahma commented 11 years ago

Can you still reproduce this with version compiled from master? There have been some fixes along the way since 2.0.1 regarding the problem of scheduler hanging when stopping.

Craigology commented 11 years ago

Hi Marko,

I have rebuilt from Master and I can confirm that the scheduler now tolerates the exceptions occurring on the AdoJobStore thread and remains responsive at service shutdown. Thanks!

I'll post a comment in #67 about the discovery of the cause of the SQL Server error which was causing AdoJobStore to deadlock in the first case.

lahma commented 11 years ago

Good to hear, I'm closing this issue for now.