Closed microrama closed 6 months ago
Hi @microrama, were you able to confirm whether adding ROLLBACK TRANSACTION
before the THROW
truly resolves the deadlocks that you encountered? I ask because I didn't expect this to be necessary since we hadn't actually changed any data at this point in the stored procedure.
Yes Chris. I would easily reproduce the issue with 2 processes trying to read from DB and create instances, say 200. Once the rollback was added, it was beautiful and even 5000 worked without issues.
On Mon, May 6, 2024 at 6:47 PM Chris Gillum @.***> wrote:
Hi @microrama https://github.com/microrama, were you able to confirm whether adding ROLLBACK TRANSACTION before the THROW truly resolves the deadlocks that you encountered? I ask because I didn't expect this to be necessary since we hadn't actually changed any data at this point in the stored procedure.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/durabletask-mssql/issues/217#issuecomment-2097053895, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3U2LVQUCUC6HVVJG7CGITZBAB77AVCNFSM6AAAAABHJRCCDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXGA2TGOBZGU . You are receiving this because you were mentioned.Message ID: @.***>
@microrama thanks for confirming! By the way, we accept pull requests from users. If you would like to submit this as a small pull request, we would happily accept it and give you appropriate credit in the Git history and release notes.
Thanks so much Chris! I will do a pull request tomorrow!!
On Mon, May 6, 2024 at 7:26 PM Chris Gillum @.***> wrote:
@microrama https://github.com/microrama thanks for confirming! By the way, we accept pull requests from users. If you would like to submit this as a small pull request, we would happily accept it and give you appropriate credit in the Git history and release notes.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/durabletask-mssql/issues/217#issuecomment-2097089597, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3U2LXDPFHYG7SRXNFMM7TZBAGUDAVCNFSM6AAAAABHJRCCDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXGA4DSNJZG4 . You are receiving this because you were mentioned.Message ID: @.***>
Chris - I submitted a pull request - https://github.com/microsoft/durabletask-mssql/pull/218
Thanks @microrama! The change looks good to me. Please sign the CLA and I'll go ahead and merge your PR.
@cgillum any chance to get a release :)
@bhugot yes, I can work on that.
@cgillum i found other transaction with throw not rollbacked
@bhugot feel free to submit a PR for the other case you found. I can include it in the next release.
@cgillum done #219
Problem: Dt.CreateInstance store proc does not ROLLBACK the transaction (line 36) when it fails to create a new orchestration since another process/thread has already created it. This happens when there are multiple processes that are trying to create orchestration instances over the same source data. This was easily recreated when multiple processes are trying to create, say 500 orchestration instances based on the data in a SQL table.
The current stored proc code throws an exception (THROW 50001, @msg, 1; ) when it encounters that another process already created the orchestration instance but it does rollback the "Begin Transaction" which is perhaps causing the deadlocks and SQL timeout exceptions in the code.
The deadlock this caused was identified using the deadlock reports from the database and the report has been provided below along with the .NET exception stack trace.
Solution: Dt.CreateInstance stored procedure with the ROLLBACK when it detects duplicate orchestration instance before throwing the exception that fixed the deadlock issues.
……. BEGIN TRANSACTION
DECLARE @existingStatus varchar(30) = (
SELECT TOP 1 existing.[RuntimeStatus]
FROM Instances existing WITH (HOLDLOCK)
WHERE [TaskHub] = @TaskHub AND [InstanceID] = @InstanceID
)
-- Instance IDs can be overwritten only if the orchestration is in a terminal state
--IF @existingStatus IN ('Pending', 'Running')
IF @existingStatus IN (SELECT value FROM STRING_SPLIT(@DedupeStatuses, ','))
BEGIN
ROLLBACK TRANSACTION; THROW 50001, @msg, 1;
END
ELSE IF @existingStatus IS NOT NULL
……………..
Exception StackTrace "message": [ "TaskOrchestrationDispatcher-25904bb427294445a049682c104c412a-0: Failed to fetch a work-item: Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.\r\n ---> System.ComponentModel.Win32Exception (258): The wait operation timed out.\r\n at Microsoft.Data.SqlClient.SqlCommand.<>c.b__2090(Task
1 result)\r\n at System.Threading.Tasks.ContinuationResultTaskFromResultTask
2.InnerInvoke()\r\n at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\r\n--- End of stack trace from previous location ---\r\n at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\r\n--- End of stack trace from previous location ---\r\n at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries)\r\n at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func
1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in //src/DurableTask.SqlServer/SqlUtils.cs:line 504\r\n at DurableTask.SqlServer.SqlUtils.ExecuteSprocAndTraceAsync[T](DbCommand command, LogHelper traceHelper, String instanceId, Func2 executor) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 454\r\n at DurableTask.SqlServer.SqlOrchestrationService.LockNextTaskOrchestrationWorkItemAsync(TimeSpan receiveTimeout, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 135\r\n at DurableTask.Core.WorkItemDispatcher
1.DispatchAsync(WorkItemDispatcherContext context)\r\nClientConnectionId:22a53d4c-5250-463f-96c0-bb08165b71a7\r\nError Number:-2,State:0,Class:11" ],Deadlock Report