Closed mthelen-taslar closed 1 year ago
Hi @mthelen-taslar, this is a known (and unfortunate) behavior of the Durable Task Framework and is the same whether you use this MSSQL backend or other backends.
I understand an orchestration can't wait for days when worker hosts are bounced regularly, but this also strikes me as surprising behavior for our use case: an idempotent orchestration to guarantee a handful of network requests succeed.
I apologize for not quite understanding the question. However, I can tell you this:
Orchestrations can wait for days if you await on a task defined on the context
object. For example, the following is perfectly acceptable:
await context.CreateTimer(context.CurrentUtcDateTime.AddDays(3));
context
object, it must be done in a TaskActivity
.You're right that static analyzers can help here. In fact, we have one for the Durable Functions .NET SDK. Unfortunately, one hasn't been developed for the base Durable Task Framework yet.
Another improvement would be to add more runtime checks. I believe some exist if you try to call a context
method after doing an "illegal await" but the logic doesn't current catch the case you've shown above. But even so, the damage will have already been done - i.e. the orchestration will still get stuck in the Running state. Static analysis is likely going to be the best way to prevent developers into running into this.
Understood! Your response makes sense. We'll follow your guidance and keep an eye out for improvements in the future.
Feel free to close this issue. Thanks for the quick turnaround.
Hi,
We recently wrote an orchestration that
await
'd tasks outside of the orchestration context's interface. A simple example that can reproduce this issue is the following:We noticed that execution was returned back to the
TaskOrchestrationExecutor
before the orchestration ended, which appended anOrchestratorCompleted
event to the system but the instance was left in aRunning
state. I am interpreting the behavior as 'any await operation on a function not wrapped by a task or sub-orchestration can leave an orchestration in a perpetual running state'. I understand an orchestration can't wait for days when worker hosts are bounced regularly, but this also strikes me as surprising behavior for our use case: an idempotent orchestration to guarantee a handful of network requests succeed. Is my understanding correct? And if so, is this a bug or expected behavior? Static analysis/linter rules could mitigate confusion for future adopters if this is the case.We are using Microsoft.DurableTask.SqlServer version=1.1.1
The state of the dt.* tables after running one of these workflows is: