The following exceptions were seen in the Docker logs when running a 100 orchestration "HelloSequences" test on a Kubernetes cluster.
warn: DurableTask.SqlServer[308]
20210413-111157-0000000000000019: A transient database failure occurred and will be retried. Current retry count: 0. Details: Microsoft.Data.SqlClient.SqlException (0x80131904): Transaction (Process ID 102) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
at Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(Boolean isInternal, Boolean forDescribeParameterEncryption)
at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)
at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /durabletask-mssql/src/DurableTask.SqlServer/SqlUtils.cs:line 420
ClientConnectionId:1bfd1928-097b-4daf-8bff-59a0c90cd87c
Error Number:1205,State:51,Class:13.
Deadlocks are not normally seen when running on fast hardware, so this might be something that was missed during local testing. Apps in this cluster were running slowly, resulting in a variety of other issues. The slowness of this cluster is likely a contributor to the problem. Also, the app was scaled out to 5 replicas using KEDA.
Even for slowly running clusters, the sprocs should be designed in such a way that simple scenarios like these should not result in deadlocks.
The following exceptions were seen in the Docker logs when running a 100 orchestration "HelloSequences" test on a Kubernetes cluster.
Deadlocks are not normally seen when running on fast hardware, so this might be something that was missed during local testing. Apps in this cluster were running slowly, resulting in a variety of other issues. The slowness of this cluster is likely a contributor to the problem. Also, the app was scaled out to 5 replicas using KEDA.
Even for slowly running clusters, the sprocs should be designed in such a way that simple scenarios like these should not result in deadlocks.