microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

Terminate Pending orchestration #178

Open jundayin opened 1 year ago

jundayin commented 1 year ago

Hi, echoing the thread https://github.com/Azure/durabletask/issues/662 in this repo.

I believe SQL provider works similar to AzureStorageOrchestrationService, which only terminate running orchestration. In our code, we schedule one orchestration which will be triggered in a month. Before it is executed, we also allow user to terminate the orchestration.

However, those scheduled orchestration will be not terminated right away but until the timer expires. Since this is a rather active task for customer, it generates 3K rows in NewEvents table within 3 weeks. Screenshot 2023-07-13 185345

Is this a provider behavior or the core behavior? It doesn't seem necessary to terminate the orchestration only when it starts running. The orchestration instance already presents in the db across the board. If it is provider behavior then I think we can have one improvement of terminate both pending and running orchestration in order to reduce the load in NewEvents table

jundayin commented 1 year ago

Actually I think this is more a bug instead of expected behavior. It is not blocking something, but I suspect it may have performance implication

I just glance there are a lot of exceptions log from the pod of our services

Backing off for 1 seconds until 5 successful operations
2023-07-19 00:34:41.8969|ERROR|DurableTask.Core|TaskOrchestrationDispatcher-c6b2e24a2c884db5b6ee1fe277fcfae1-0: Unhandled exception with work item 'aaf0757b-5c35-46e5-9d8b-bc8406ee0289': System.NullReferenceException: Object reference not set to an instance of an object.
   at DurableTask.SqlServer.SqlOrchestrationService.CompleteTaskOrchestrationWorkItemAsync(TaskOrchestrationWorkItem workItem, OrchestrationRuntimeState newRuntimeState, IList`1 outboundMessages, IList`1 orchestratorMessages, IList`1 timerMessages, TaskMessage continuedAsNewMessage, OrchestrationState orchestrationState) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 311
   at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 548
   at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemSessionAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 195
   at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 459

Not surprisingly, if I query the instance id in [dt].[NewEvents] table. I will end up with two events here. One is the pending ExecutionStarted event and another is ExecutionTerminated event. I queried at least 30 instance ids and they are in the same pattern

Screenshot 2023-07-18 at 5 39 27 PM

cc: @cgillum Are you aware of this? We are currently still on the 1.0.0-rc due to the breaking change in 1.0.0-rc2 and we plan to migrate the latest version. So we do not know if this issue is already fixed or not. If not, please triage it as a bug

conreaux commented 1 month ago

We're also seeing this behavior in v1.2.2. ExecutionTerminated event is blocked by the ExecutionStarted event in the Pending state. ETA on fix?

cgillum commented 1 month ago

Apologies - this issue was not on my radar until now. If I'm reading the above repro steps correctly, it seems that the combination of a scheduled creation plus a termination (before the orchestration starts) results in an internal error. I can try to see if I can reproduce.

conreaux commented 1 month ago

@cgillum Thanks for the quick turnaround. Much appreciated.