microsoft / durabletask-dotnet

Out-of-process .NET SDK for the Durable Task Framework
MIT License
114 stars 33 forks source link

PurgeInstancesAsync fails after 100 seconds #268

Open Alan-Hinton opened 8 months ago

Alan-Hinton commented 8 months ago

Our function app makes extensive use of durable functions and this is saving a lot of data into the durable functions tables. I am trying to use PurgeInstancesAsync to clear out those tables. I use two different queries, one to delete all durable functions more than a month old

            var agesAgo = DateTime.UtcNow.AddYears(-1);
            var deleteAllBefore = DateTime.UtcNow.AddMonths(-1);
            var allStatusesDeleted = await orchestrationClient.PurgeInstancesAsync(agesAgo, deleteAllBefore);

and one to delete all completed durable functions more than 7 days old

            var deleteCompletedBefore = DateTime.UtcNow.Subtract(TimeSpan.FromDays(7));
            var completed = new[] { OrchestrationRuntimeStatus.Completed };
            var compltedStatusDeleted = await orchestrationClient.PurgeInstancesAsync(agesAgo , deleteCompletedBefore , completed);

The former seems to work if the tables are not too large, but fails once they reach a certain size. As far as I can tell the later never works.

With large tables PurgeInstancesAsync always fails after 100 seconds with the following error message:

Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Threading.Tasks.Task.GetExceptions(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at Microsoft.Azure.Functions.Worker.Invocation.DefaultFunctionInvoker2.<>c.b6_0(Task1 t) in D:\a\_work\1\s\src\DotNetWorker.Core\Invocation\DefaultFunctionInvoker.cs:line 32 at System.Threading.Tasks.ContinuationResultTaskFromResultTask2.InnerInvoke() at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) at System.Threading.Tasks.ThreadPoolTaskScheduler.TryExecuteTaskInline(Task task, Boolean taskWasPreviouslyQueued) at System.Threading.Tasks.TaskContinuation.InlineIfPossibleOrElseQueue(Task task, Boolean needsProtection) at System.Threading.Tasks.ContinueWithTaskContinuation.Run(Task completedTask, Boolean canInlineContinuationTask) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetException(Exception exception, Task1& taskField) at Microsoft.Azure.Functions.Worker.Invocation.VoidTaskMethodInvoker2.InvokeAsync(TReflected instance, Object[] arguments) in D:\a\_work\1\s\src\DotNetWorker.Core\Invocation\VoidTaskMethodInvoker.cs:line 22 at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetException(Exception exception, Task1& taskField) at UNO.Wind.Shared.Helpers.MonitoringControllerBase.CleanUpDurableFunctions(DurableTaskClient orchestrationClient) in D:\a\smart-wind-downtime-api\smart-wind-downtime-api\RES.SMART.Wind.Models\UNO.Wind.Shared\Helpers\MonitoringControllerBase.cs:line 202 at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetException(Exception exception, Task1& taskField) at Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient.PurgeInstancesCoreAsync(PurgeInstancesRequest request, CancellationToken cancellation) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.ExecuteFromThreadPool(Thread threadPoolThread) at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() --- End of stack trace from previous location --- at UNO.Wind.Shared.Middleware.CustomExceptionHandlerMiddleware.HandleExceptionAsync(FunctionContext context, Exception exception) in D:\a\smart-wind-downtime-api\smart-wind-downtime-api\RES.SMART.Wind.Models\UNO.Wind.Shared\Middleware\CustomExceptionHandlerMiddleware.cs:line 50 at UNO.Wind.Shared.Middleware.CustomExceptionHandlerMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\smart-wind-downtime-api\smart-wind-downtime-api\RES.SMART.Wind.Models\UNO.Wind.Shared\Middleware\CustomExceptionHandlerMiddleware.cs:line 29 at UNO.Wind.Shared.Middleware.AuthenticationMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\smart-wind-downtime-api\smart-wind-downtime-api\RES.SMART.Wind.Models\UNO.Wind.Shared\Middleware\AuthenticationMiddleware.cs:line 27 at Microsoft.Azure.Functions.Worker.FunctionsApplication.InvokeFunctionAsync(FunctionContext context) in D:\a\_work\1\s\src\DotNetWorker.Core\FunctionsApplication.cs:line 77 Stack: at System.Threading.Tasks.Task.GetExceptions(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at Microsoft.Azure.Functions.Worker.Invocation.DefaultFunctionInvoker`2.<>c.b6_0(Task1 t) in D:\a\_work\1\s\src\DotNetWorker.Core\Invocation\DefaultFunctionInvoker.cs:line 32 at System.Threading.Tasks.ContinuationResultTaskFromResultTask2.InnerInvoke() at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) at System.Threading.Tasks.ThreadPoolTaskScheduler.TryExecuteTaskInline(Task task, Boolean taskWasPreviouslyQueued) at System.Threading.Tasks.TaskContinuation.InlineIfPossibleOrElseQueue(Task task, Boolean needsProtection) at System.Threading.Tasks.ContinueWithTaskContinuation.Run(Task completedTask, Boolean canInlineContinuationTask) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetException(Exception exception, Task1& taskField) at Microsoft.Azure.Functions.Worker.Invocation.VoidTaskMethodInvoker2.InvokeAsync(TReflected instance, Object[] arguments) in D:\a\_work\1\s\src\DotNetWorker.Core\Invocation\VoidTaskMethodInvoker.cs:line 22 at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetException(Exception exception, Task1& taskField) at UNO.Wind.Shared.Helpers.MonitoringControllerBase.CleanUpDurableFunctions(DurableTaskClient orchestrationClient) in D:\a\smart-wind-downtime-api\smart-wind-downtime-api\RES.SMART.Wind.Models\UNO.Wind.Shared\Helpers\MonitoringControllerBase.cs:line 202 at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetException(Exception exception, Task1& taskField) at Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient.PurgeInstancesCoreAsync(PurgeInstancesRequest request, CancellationToken cancellation) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox`1.ExecuteFromThreadPool(Thread threadPoolThread) at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() --- End of stack trace from previous location --- at UNO.Wind.Shared.Middleware.CustomExceptionHandlerMiddleware.HandleExceptionAsync(FunctionContext context, Exception exception) in D:\a\smart-wind-downtime-api\smart-wind-downtime-api\RES.SMART.Wind.Models\UNO.Wind.Shared\Middleware\CustomExceptionHandlerMiddleware.cs:line 50 at UNO.Wind.Shared.Middleware.CustomExceptionHandlerMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\smart-wind-downtime-api\smart-wind-downtime-api\RES.SMART.Wind.Models\UNO.Wind.Shared\Middleware\CustomExceptionHandlerMiddleware.cs:line 29 at UNO.Wind.Shared.Middleware.AuthenticationMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\smart-wind-downtime-api\smart-wind-downtime-api\RES.SMART.Wind.Models\UNO.Wind.Shared\Middleware\AuthenticationMiddleware.cs:line 27 at Microsoft.Azure.Functions.Worker.FunctionsApplication.InvokeFunctionAsync(FunctionContext context) in D:\a_work\1\s\src\DotNetWorker.Core\FunctionsApplication.cs:line 77

This leads me to believe there is a timeout somewhere that is limiting the query to 100s. It would be great if callers could set the timeout to get around this problem. Or have some way of deleting the data chunks at a time.

I have tried splitting the requests into 1 day periods, but this seems to make no difference. I suspect this is because all the time is taken finding the items to delete and the method is timing out before any deleting happens. I notice the date of the function is not one of the keys on the table, so whatever date range is selected a full table scan is required to find the items to delete.

ciranmc commented 4 months ago

Did we ever get a fix or workaround for this? I am facing exactly the same issue when running Azure Function locally but with the WaitForInstanceCompleteAsync method. Seems like a fundamental issue as these methods are designed for long running processes. There has to be some easy way to configure the httpClient timeout via services collection or something. Any help would be greatly appreciated.

Alan-Hinton commented 3 months ago

I spent quite some time with Azure support trying to get a resolution to this issue, but one was never found. They did suggest manually deleting the tables, but I wasn't comfortable with this because my system almost always has running durable functions so I didn't want to lose them. Also it seemed like the problem would just return.

Hopefully the developers can find a way to make these methods work as they should.