temporalio / sdk-dotnet

Temporal .NET SDK
MIT License
375 stars 30 forks source link

[Bug] Worker does not retry if connection fails and kills the whole process #290

Closed petrkoutnycz closed 2 months ago

petrkoutnycz commented 2 months ago

What are you really trying to do?

Trying to set up Temporal in our application.

Describe the bug

When there's a problem when connecting to Temporal service, the WorkerService goes down without retrying.

Also, since .NET 6 where background service behavior was changed (see here), it takes down the whole app with it. This is unacceptable for us as we need it to run as part the app next to other background services.

Attaching an example of erroneous output. Even though I have permission problem in this case, the whole app goes down. It could be any connection-related problem because of the way how exceptions are handled from the bridge client.

image

Minimal Reproduction

Basically any sample app works when the connection is wrong. Tested it with .NET 8.

Environment/Versions

Irrelevant.

Additional context

Not yet.

cretz commented 2 months ago

We intentionally fail if the worker cannot start. We do retry if there is a connectivity issue during operation, but even then for errors we deem non-retryable we fail after a minute which will fail the service. But users have requested that if the worker cannot start it should fail which seems reasonable. Are you saying the worker should never fail and should just continue to retry? Many users want their process to fail on this kind of failure for visibility.

Also worst case scenario, https://github.com/temporalio/sdk-dotnet/blob/main/src/Temporalio.Extensions.Hosting/TemporalWorkerService.cs is a very simple little helper around our normal workers if you needed to have your own alternative.

petrkoutnycz commented 2 months ago

I am closing this as I am free to write my own worker service with reliability anyhow I want. If I get the free time, I will make it part of samples repo.