Open jundayin opened 1 year ago
Actually I think this is more a bug instead of expected behavior. It is not blocking something, but I suspect it may have performance implication
I just glance there are a lot of exceptions log from the pod of our services
Backing off for 1 seconds until 5 successful operations
2023-07-19 00:34:41.8969|ERROR|DurableTask.Core|TaskOrchestrationDispatcher-c6b2e24a2c884db5b6ee1fe277fcfae1-0: Unhandled exception with work item 'aaf0757b-5c35-46e5-9d8b-bc8406ee0289': System.NullReferenceException: Object reference not set to an instance of an object.
at DurableTask.SqlServer.SqlOrchestrationService.CompleteTaskOrchestrationWorkItemAsync(TaskOrchestrationWorkItem workItem, OrchestrationRuntimeState newRuntimeState, IList`1 outboundMessages, IList`1 orchestratorMessages, IList`1 timerMessages, TaskMessage continuedAsNewMessage, OrchestrationState orchestrationState) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 311
at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 548
at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemSessionAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 195
at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 459
Not surprisingly, if I query the instance id in [dt].[NewEvents] table. I will end up with two events here. One is the pending ExecutionStarted
event and another is ExecutionTerminated
event. I queried at least 30 instance ids and they are in the same pattern
cc: @cgillum Are you aware of this? We are currently still on the 1.0.0-rc due to the breaking change in 1.0.0-rc2 and we plan to migrate the latest version. So we do not know if this issue is already fixed or not. If not, please triage it as a bug
We're also seeing this behavior in v1.2.2. ExecutionTerminated event is blocked by the ExecutionStarted event in the Pending state. ETA on fix?
Apologies - this issue was not on my radar until now. If I'm reading the above repro steps correctly, it seems that the combination of a scheduled creation plus a termination (before the orchestration starts) results in an internal error. I can try to see if I can reproduce.
@cgillum Thanks for the quick turnaround. Much appreciated.
Hi, echoing the thread https://github.com/Azure/durabletask/issues/662 in this repo.
I believe SQL provider works similar to AzureStorageOrchestrationService, which only terminate running orchestration. In our code, we schedule one orchestration which will be triggered in a month. Before it is executed, we also allow user to terminate the orchestration.
However, those scheduled orchestration will be not terminated right away but until the timer expires. Since this is a rather active task for customer, it generates 3K rows in NewEvents table within 3 weeks.
Is this a provider behavior or the core behavior? It doesn't seem necessary to terminate the orchestration only when it starts running. The orchestration instance already presents in the db across the board. If it is provider behavior then I think we can have one improvement of terminate both pending and running orchestration in order to reduce the load in NewEvents table