microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

Kubernetes Quickstart Error - Docker Desktop (Windows) #133

Open conreaux opened 1 year ago

conreaux commented 1 year ago

While attempting the Kubernetes Quickstart, I've encountered the following error when deploying to a local k8s cluster using Docker Desktop for Windows. I had seen an SO thread about a possibly missing WEBSITE_HOSTNAME env var, but adding it as localhost:80 in msssql-secrets.yml did not resolve the issue.

PS C:\Users\conreaux\source\repos\durabletask-mssql\test\PerformanceTests> func kubernetes deploy --name $deploymentName --image-name "$repo/mssql-durable-functions:latest" --secret-name "mssql-secrets" --max-replicas 5
secret/func-keys-kube-secret-durabletask-mssql-app created
serviceaccount/durabletask-mssql-app-function-keys-identity-svc-act created
role.rbac.authorization.k8s.io/functions-keys-manager-role created
rolebinding.rbac.authorization.k8s.io/durabletask-mssql-app-function-keys-identity-svc-act-functions-keys-manager-rolebinding created
service/durabletask-mssql-app-http created
deployment.apps/durabletask-mssql-app-http created
deployment.apps/durabletask-mssql-app created
scaledobject.keda.sh/durabletask-mssql-app created
Waiting for deployment "durabletask-mssql-app-http" rollout to finish: 0 of 1 updated replicas are available...
deployment "durabletask-mssql-app-http" successfully rolled out
        PurgeOrchestrationData - [httpTrigger]
Invalid URI: The hostname could not be parsed.

The durabletask-mssql-app-http-* pod starts, and I'm able to POST to the StartManySequences endpoint, but get subsequent errors. I assume not everything has instantiated correctly due to the prior error. Any suggestions?

2022-11-18 07:32:46 Hosting environment: Production
2022-11-18 07:32:46 Content root path: /azure-functions-host
2022-11-18 07:32:46 Now listening on: http://[::]:80
2022-11-18 07:32:46 Application started. Press Ctrl+C to shut down.
2022-11-18 07:35:37 warn: Function.StartManySequences.User[0]
2022-11-18 07:35:37       Scheduling 100 orchestration(s) with a prefix of '20221118-013537'...
2022-11-18 07:35:38 warn: Function.StartManySequences.User[0]
2022-11-18 07:35:38       All 100 orchestrations were scheduled successfully!
2022-11-18 07:35:38 fail: Function.HelloCities[3]
2022-11-18 07:35:38       Executed 'HelloCities' (Failed, Id=7b6b25c1-9abb-462b-a529-5badb17ef1b5, Duration=17ms)
2022-11-18 07:35:38       System.InvalidOperationException: Unable to load metadata for function 'HelloCities'.
2022-11-18 07:35:38          at Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.FunctionInstanceLogger.StartFunction(FunctionInstanceLogEntry item) in /src/azure-functions-host/src/WebJobs.Script.WebHost/Diagnostics/FunctionInstanceLogger.cs:line 75
2022-11-18 07:35:38          at Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.FunctionInstanceLogger.AddAsync(FunctionInstanceLogEntry item, CancellationToken cancellationToken) in /src/azure-functions-host/src/WebJobs.Script.WebHost/Diagnostics/FunctionInstanceLogger.cs:line 53
2022-11-18 07:35:38          at Microsoft.Azure.WebJobs.Host.Loggers.CompositeFunctionEventCollector.AddAsync(FunctionInstanceLogEntry item, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Loggers\CompositeFunctionEventCollector.cs:line 23
2022-11-18 07:35:38          at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.TryExecuteAsync(IFunctionInstance functionInstance, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 106
cgillum commented 1 year ago

The error message seems to be coming from the constructor of System.Uri. Can you try changing your WEBSITE_HOSTNAME environment variable from localhost:80 to http://localhost:80?

conreaux commented 1 year ago

Same error. Here's the output of printenv:

PS C:\Users\conreaux\source\repos\durabletask-mssql\test\PerformanceTests> kubectl exec durabletask-mssql-app-http-5c74887ffb-gbt57 -- printenv
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=durabletask-mssql-app-http-5c74887ffb-gbt57
AzureWebJobsKubernetesSecretName=secrets/func-keys-kube-secret-durabletask-mssql-app
WEBSITE_HOSTNAME=http://localhost:80
AzureFunctionsJobHost__functions__4=StartManySequences
AzureFunctionsJobHost__functions__1=StartLongHaul
AzureFunctionsJobHost__functions__2=StartManyEntities
AzureFunctionsJobHost__functions__3=StartManyMixedOrchestrations
AzureWebJobsSecretStorageType=kubernetes
SQLDB_Connection=Server=mssqlinst.mssql.svc.cluster.local;Database=DurableDB;User ID=sa;Password=Pass@word1;Persist Security Info=False;TrustServerCertificate=True;Encrypt=True;
AzureFunctionsJobHost__functions__0=PurgeOrchestrationData
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
DURABLETASK_MSSQL_APP_HTTP_SERVICE_PORT=80
DURABLETASK_MSSQL_APP_HTTP_SERVICE_HOST=10.109.48.116
DURABLETASK_MSSQL_APP_HTTP_PORT=tcp://10.109.48.116:80
DURABLETASK_MSSQL_APP_HTTP_PORT_80_TCP=tcp://10.109.48.116:80
DURABLETASK_MSSQL_APP_HTTP_PORT_80_TCP_PORT=80
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_PORT=tcp://10.96.0.1:443
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
DURABLETASK_MSSQL_APP_HTTP_PORT_80_TCP_PROTO=tcp
DURABLETASK_MSSQL_APP_HTTP_PORT_80_TCP_ADDR=10.109.48.116
ASPNETCORE_URLS=http://+:80
DOTNET_RUNNING_IN_CONTAINER=true
AzureWebJobsScriptRoot=/home/site/wwwroot
HOME=/home
FUNCTIONS_WORKER_RUNTIME=dotnet
DOTNET_USE_POLLING_FILE_WATCHER=true
HOST_VERSION=4.14.0
ASPNETCORE_CONTENTROOT=/azure-functions-host
AzureFunctionsJobHost__Logging__Console__IsEnabled=true
cgillum commented 1 year ago

Oh, I was focused on this error message:

Invalid URI: The hostname could not be parsed.

But after re-reading your original post, I'm realizing that the more important error is this one:

System.InvalidOperationException: Unable to load metadata for function 'HelloCities'.

My understanding from speaking with other members of the Functions team is that there seems to be an issue with the Functions runtime where the splitting of HTTP and non-HTTP functions doesn't work correctly in some cases, causing orchestrations to get scheduled on the wrong pod. I've seen this issue occasionally, but not consistently (others have reported that its 100% reproducible).

I think one way to fix this is to ensure that the durabletask-mssql-app-http-* pod has the orchestration and activity trigger functions enabled. I believe the AzureFunctionsJobHost__functions__N environment variables control this, though I'm not completely confident about that. If you have the patience to do so, it might be worth experimenting with.

Ultimately, it seems like the func kubernetes deploy mechanism is no longer working reliably. I'll probably need to look into changing the quick start instructions to do something more manual until the underlying Functions runtime issue gets resolved (and I don't know if there's any ETA on that).

conreaux commented 1 year ago

I was able to root cause the URI exception and created an issue in the azure-functions-core-tools repo here.

For the other exception, it appears that the runtime doesn't respect the allow list controlled by the AzureFunctionsJobHost__functions__N environment variables and will still attempt to schedule orchestrations on the HTTP pod.

By using func kubernetes deploy with the --dry-run option, I was able to dump the deployment to a YAML file. It does appear that func kubernetes deploy is working as intended since it produces the expected list of AzureFunctionsJobHost__functions__N variables for the HTTP and non-HTTP deployments.

I also confirmed that manually adding AzureFunctionsJobHost__functions__N variables to the deployment YAML to enable the orchestration/activity functions on the HTTP pod does eliminate the error. However, I don't believe this is a desirable solution.

Another workaround is to separate the HTTP and non-HTTP functions into separate projects each with its own Dockerfile, and therefore distinct images for client and server. For this to work, I also needed to set the ExternalClient property to true on the DurableClient attribute in the client function.