microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

Problem terminating orchestration with running activity #81

Closed IGx89 closed 2 years ago

IGx89 commented 2 years ago

Hello! When I terminate a (sub) orchestration with a running activity, the orchestration gets properly marked as Terminated in the Instances table but when the activity it was running finishes (less than a minute later), an exception is thrown and the task remains in the NewTasks table. The runtime then picks up the task again every 2min (lock interval), runs the activity again, and keeps throwing the same exception.

Exception and related logs (Reservation_0vl5ureu_BroadcastOrchestrator is the name of the terminated orchestration, ReservationBroadcast_Broadcast is the name of the activity):

[2022-04-27T20:39:43.664Z] Executing 'ReservationBroadcast_Broadcast' (Reason='(null)', Id=f6dced9c-1439-4af7-9b87-0d1e5750d030)
[2022-04-27T20:39:43.681Z] Executed 'ReservationBroadcast_Broadcast' (Succeeded, Id=f6dced9c-1439-4af7-9b87-0d1e5750d030, Duration=17ms)
[2022-04-27T20:39:43.699Z] TaskActivityDispatcher-6466eb79540a465b93abcb100bf5ad0d-0: Unhandled exception with work item 'Reservation_0vl5ureu_BroadcastOrchestrator:0000000000000000': Microsoft.Data.SqlClient.SqlException (0x80131904): The target instance is not running. It may have already completed (in which case this execution can be considered a duplicate) or been terminated. Any results of this task activity execution will be discarded.
[2022-04-27T20:39:43.702Z]    at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
[2022-04-27T20:39:43.706Z]    at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
[2022-04-27T20:39:43.711Z]    at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
[2022-04-27T20:39:43.713Z]    at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
[2022-04-27T20:39:43.716Z]    at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
[2022-04-27T20:39:43.718Z]    at Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(Boolean isInternal, Boolean forDescribeParameterEncryption)
[2022-04-27T20:39:43.723Z]    at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)
[2022-04-27T20:39:43.726Z]    at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
[2022-04-27T20:39:43.729Z]    at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
[2022-04-27T20:39:43.731Z]    at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
[2022-04-27T20:39:43.733Z] --- End of stack trace from previous location ---
[2022-04-27T20:39:43.736Z]    at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 500
[2022-04-27T20:39:43.739Z]    at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 507
[2022-04-27T20:39:43.742Z]    at DurableTask.SqlServer.SqlUtils.ExecuteSprocAndTraceAsync[T](DbCommand command, LogHelper traceHelper, String instanceId, Func`2 executor) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 434
[2022-04-27T20:39:43.745Z]    at DurableTask.SqlServer.SqlOrchestrationService.CompleteTaskActivityWorkItemAsync(TaskActivityWorkItem workItem, TaskMessage responseMessage) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 387
[2022-04-27T20:39:43.747Z]    at DurableTask.Core.TaskActivityDispatcher.OnProcessWorkItemAsync(TaskActivityWorkItem workItem) in /_/src/DurableTask.Core/TaskActivityDispatcher.cs:line 240
[2022-04-27T20:39:43.749Z]    at DurableTask.Core.TaskActivityDispatcher.OnProcessWorkItemAsync(TaskActivityWorkItem workItem) in /_/src/DurableTask.Core/TaskActivityDispatcher.cs:line 263
[2022-04-27T20:39:43.753Z]    at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 373
[2022-04-27T20:39:43.755Z] ClientConnectionId:8544a171-c7cc-4599-8d48-e9e852613774
[2022-04-27T20:39:43.757Z] Error Number:50003,State:1,Class:16
[2022-04-27T20:39:43.759Z]
[2022-04-27T20:39:43.761Z] Backing off for 1 seconds until 5 successful operations
[2022-04-27T20:39:43.762Z] Core Microsoft SqlClient Data Provider: The target instance is not running. It may have already completed (in which case this execution can be considered a duplicate) or been terminated. Any results of this task activity execution will be discarded.

This issue does not exist with the built-in Azure Storage provider. If the cause is not obvious and you need a repro, just let me know and I'll work on putting one together. It seems the situation is properly identified, but the logic just isn't gracefully handling it (removing the task and not logging an exception).

Thanks!

cgillum commented 2 years ago

Thanks for reporting this issue. We'll investigate and work on a fix!