Closed bhugot closed 1 year ago
@bhugot what's the size of your payload? The SQL provider uses varchar(MAX)
for storing payloads, which supports up to 2 GB. Do you have a payload that's larger than this? Azure Storage may not be a great choice either for payloads this large.
After more analyse it seems it s in fact more a timeout in that case but due too the payload size. I will get the stack trace
TaskOrchestrationDispatcher-e8f908058dee4e5982ca5d1d14ad63e5-0: Unhandled exception with work item '04f96af8-39f6-4f63-8e15-e63936e2ea3d': Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception (258): The wait operation timed out.
at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)
at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location ---
at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 496
at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 503
at DurableTask.SqlServer.SqlUtils.ExecuteSprocAndTraceAsync[T](DbCommand command, LogHelper traceHelper, String instanceId, Func`2 executor) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 438
at DurableTask.SqlServer.SqlOrchestrationService.CompleteTaskOrchestrationWorkItemAsync(TaskOrchestrationWorkItem workItem, OrchestrationRuntimeState newRuntimeState, IList`1 outboundMessages, IList`1 orchestratorMessages, IList`1 timerMessages, TaskMessage continuedAsNewMessage, OrchestrationState orchestrationState) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 375
at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 551
at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemSessionAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 194
at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 373
ClientConnectionId:a44b295e-5fe9-4952-8566-be45f2e88e22
Error Number:-2,State:0,Class:11
ClientConnectionId before routing:43d4b49d-060a-40a0-abe9-2a9df4786ec6
Routing Destination:b8c44c89e538.tr604.westeurope1-a.worker.database.windows.net,11044
Backing off for 1 seconds until 5 successful operations
I have upscale database dtu and it's now ok. I will try to find the reason of the timeout but maybe a to big insert
Hello I m back on this problem. We have some batch orchestration that are loading massive amount of data like 500K string , then call another orchestration for each one of them. It was working fine on Azure Storage but it's going IDLE with mssql. The payload is inserted in the table but nothing is done with it. I suppose because of the timeout. Maybe it should be a parameter that allow to change timeout of a command. that would solve this issue
Ok by adding Command Timeout in connection string I solved the problem but maybe a hard coded change would be better
@bhugot There's a massive performance issue in microsoft.data.sqlclient when reading large fields with async methods. If the payload is large it's probably tripping over that issue. The suggested workaround is to either use the streaming API if you want to stay async, or move anything that interacts with things like varchar(max) to sync. In either case that's a change in durabletask-mssql.
Wow, thanks @kevbry for mentioning this! I'm very much open to using the sync overloads for the background* reads that we do from SQL. There really isn't a possibility for thread-pool starvation since parallelism is already throttled, so it feels like an easy performance win with little downside.
*By background, I'm talking about all SQL reads driven my IOrchestrationService
methods. Anything called by IOrchestrationServiceClient
should still use async since those calls are typically made by threads owned by the app.
@cgillum I haven't done any profiling to see if it's actually the cause here, but seems possible. SqlUtil in this project seems to be using SqlClient in a similar way to the app we had that tripped over that same bug.
If there's anything at all you can do internally to raise the profile of the issue that'd probably be helpful (and appreciated by people like me who keep tripping over it). Reading the entire thread on 593 it sounds like the cause is understood, but the people who actually understand it can't get enough time allocated to fix it.
Thanks @kevbry, indeed by running the performance tests on sql client all command synchronous method are faster and with less allocation than asynchronous one
Method | Mean | Error | StdDev | Completed Work Items | Lock Contentions | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
ExecuteReader | 48.60 ms | 4.094 ms | 2.141 ms | - | - | - | - | - | 8.13 KB |
ExecuteReaderAsync | 80.30 ms | 6.355 ms | 4.204 ms | 873.0000 | - | 1500.0000 | - | - | 11194.37 KB |
ExecuteScalar | 38.24 ms | 0.907 ms | 0.539 ms | - | - | - | - | - | 8.11 KB |
ExecuteScalarAsync | 38.31 ms | 0.694 ms | 0.459 ms | 5.0000 | - | - | - | - | 10.8 KB |
ExecuteNonQuery | 37.94 ms | 0.666 ms | 0.440 ms | - | - | - | - | - | 6 KB |
ExecuteNonQueryAsync | 39.58 ms | 3.188 ms | 2.109 ms | 3.0000 | - | - | - | - | 7.42 KB |
ExecuteXmlReader | 158.77 ms | 34.782 ms | 20.698 ms | - | - | - | - | - | 1445.5 KB |
ExecuteXmlReaderAsync | 190.03 ms | 29.453 ms | 17.527 ms | 244.0000 | - | - | - | - | 474.11 KB |
VARBINARY_MAX_ExecuteScalar | 33.64 ms | 0.750 ms | 0.446 ms | - | - | 500.0000 | 500.0000 | 500.0000 | 20481.99 KB |
VARBINARY_MAX_ExecuteScalarAsync | 2,940.72 ms | 170.859 ms | 101.676 ms | 1320.5000 | - | 2000.0000 | 1000.0000 | 500.0000 | 31404.74 KB |
@bhugot do you still have the test suite you'd developed as part of this issue? They're looking for samples in the sqlclient issue I mentioned earlier.
Hello, we met a problem with sql provider as the payload is stored in table when he is too big it's truncated. and after that we have some timeout and nothing is processed. I think we should use the azure storage way to handle it by file in that case.