microsoft / durabletask-dotnet

Out-of-process .NET SDK for the Durable Task Framework
MIT License
105 stars 31 forks source link

Known bug: Utilizing custom middleware on orchestrators may generate non-determinism exceptions #158

Open davidmrdavid opened 1 year ago

davidmrdavid commented 1 year ago

Bug Description:

Applying custom middleware to an orchestrator's invocation pipeline may result in non-determinism exceptions. In essence, this is because custom middleware is being interpreted by the DurableTask framework as being part of the orchestrator code, which needs to abide by specific coding constraints to prevent non-determinism errors. When middleware logic does not abide by these constraints, the DurableTask framework will flag the orchestrator as non-deterministic and fail the invocation.

This issue was originally reported here: https://github.com/microsoft/durabletask-dotnet/issues/153

Diagnosis

If the following conditions are met, then you may be affected by this bug: (1) Your orchestrators are failing with exceptions prefixed with Non-Deterministic workflow detected: (2) You application injects custom middleware during function invocations. Example scenario: you're using the Azure AppConfiguration middleware. (3) Your orchestrators definition is deterministic (4) Removing the middleware prevents the non-determinism exceptions

Workaround:

While we work to fix this bug, there are two main workarounds you can consider:

(1) Skip your custom middleware logic when it is used to invoke an orchestrator. You may detect that an orchestrator is being invoked by re-using this helper method.

(2) Do not use custom middleware in your function invocations. We realize this is not an ideal solution.

Long term fix

The specific long term fix is still being discussed. For now, we're tracking that work here: https://github.com/Azure/azure-functions-dotnet-worker/issues/1666

RobARichardson commented 8 months ago

For several weeks, my team & I have been troubleshooting sub-orchestrator function failures due to the following exception:

Error Details:

- FormattedMessage: The orchestrator function completed on a non-orchestrator thread!
- Exception Type: System.InvalidOperationException
- Message: An invalid asynchronous invocation was detected. This can be caused by awaiting non-durable tasks in an orchestrator function's implementation or by middleware that invokes asynchronous code.
- Problem ID: System.InvalidOperationException at Microsoft.Azure.Functions.Worker.Extensions.DurableTask.FunctionsOrchestrationContext.ThrowIfIllegalAccess
- Assembly: Microsoft.Azure.Functions.Worker.Extensions.DurableTask, Version=1.0.3.0, Culture=neutral, PublicKeyToken=014045d636e89289  
- CategoryName: Microsoft.Azure.Functions.Worker.Extensions.DurableTask.DurableTaskFunctionsMiddleware  

Our search for answers led us to this issue. Since we had developed custom function middleware, we tried removing it but it had no impact. Furthermore, we could not reproduce the issue locally - only in Azure. Yesterday, we turned our attention to what could be unique about our environment in Azure. My organization uses DataDog for Application Monitoring and the Azure Function App in question uses the DataDog AAS Extension. After removing the DataDog AAS Extension from the Function App, this exception has disappeared completely.

I'm wondering if the team working on durabletask-dotnet has any insight into what could be going on here and whether these two things could be related.

danniefraim commented 7 months ago

For several weeks, my team & I have been troubleshooting sub-orchestrator function failures due to the following exception:

Error Details:

- FormattedMessage: The orchestrator function completed on a non-orchestrator thread!
- Exception Type: System.InvalidOperationException
- Message: An invalid asynchronous invocation was detected. This can be caused by awaiting non-durable tasks in an orchestrator function's implementation or by middleware that invokes asynchronous code.
- Problem ID: System.InvalidOperationException at Microsoft.Azure.Functions.Worker.Extensions.DurableTask.FunctionsOrchestrationContext.ThrowIfIllegalAccess
- Assembly: Microsoft.Azure.Functions.Worker.Extensions.DurableTask, Version=1.0.3.0, Culture=neutral, PublicKeyToken=014045d636e89289    
- CategoryName: Microsoft.Azure.Functions.Worker.Extensions.DurableTask.DurableTaskFunctionsMiddleware    

Our search for answers led us to this issue. Since we had developed custom function middleware, we tried removing it but it had no impact. Furthermore, we could not reproduce the issue locally - only in Azure. Yesterday, we turned our attention to what could be unique about our environment in Azure. My organization uses DataDog for Application Monitoring and the Azure Function App in question uses the DataDog AAS Extension. After removing the DataDog AAS Extension from the Function App, this exception has disappeared completely.

I'm wondering if the team working on durabletask-dotnet has any insight into what could be going on here and whether these two things could be related.

This was super interesting! I found this issue while investigating the same exception you're getting, and after just enabling Datadog monitoring for our application. It feels like this might be something that could be handled by Datadog as well - have you reported it to them, @RobARichardson?

ForteUnited commented 6 months ago

We've run into the same issue using Azure App Configuration and wiring up the App Configuration SDK for dynamic config changes using a sentinel value.

This code causes the issue with the Azure App Configuration nuget/sdk

public static void Main()
{
    var host = new HostBuilder()
        .ConfigureAppConfiguration(builder =>
        {
            // Omitted the code added in the previous step.
            // ... ...
        })
        .ConfigureServices(services =>
        {
            // Make Azure App Configuration services available through dependency injection.
            services.AddAzureAppConfiguration();
        })
        .ConfigureFunctionsWorkerDefaults(app =>
        {
            // Use Azure App Configuration middleware for data refresh.
            app.UseAzureAppConfiguration();
        })
        .Build();

    host.Run();
}

Taken from this MS article -> https://learn.microsoft.com/en-us/azure/azure-app-configuration/enable-dynamic-configuration-azure-functions-csharp?tabs=isolated-process#reload-data-from-app-configuration

jkdmyrs commented 5 months ago

I am also having issues with the Azure App Configuration middleware and durable tasks. It appears to impact sub-orchestrations, as mentioned above.

smackodale commented 1 month ago

We are having the same issue with Azure App Config. This is our first attempt at Durable Functions and it unfortunately it falls over at what some would consider an essential part of configuration management within an enterprise platform. Is there any update to this, a fix/timeframe for a fix or at the minimum a work around?

The following 2 bugs are also related:

davidmrdavid commented 1 month ago

Hi @smackodale : does the workaround described at the beginning of this thread not work for you? If it doesn't work - could you please open a new issue to describe your particular issue?