microsoft / durabletask-go

The Durable Task Framework is a lightweight, embeddable engine for writing durable, fault-tolerant business logic (orchestrations) as ordinary code.
Apache License 2.0
178 stars 25 forks source link

add retries to GetWorkItems stream connection #72

Closed famarting closed 2 weeks ago

famarting commented 3 weeks ago

fix for https://github.com/microsoft/durabletask-go/issues/70

This PR adds infinite retries (with exponential backoff up to 15s) to the GetWorkItems call in the client worker. IMO it makes sense to do infinite retries, since the function is not blocking the user and the function signature does not allow to pass errors asynchronously, the only control the user has over StartWorkItemListener is via the context passed as parameter.

Also it makes sense to retry forever since the user would expect this work item listener to be reliable in order to always receive and process work.

famarting commented 3 weeks ago

@microsoft-github-policy-service agree company="Diagrid"

famarting commented 2 weeks ago

yes, I've been testing it for days now

== APP - wf2 == INFO: 2024/06/20 14:47:24 successfully reconnected work item listener stream...
== APP - wf2 == ERROR: 2024/06/20 14:49:25 background processor received stream error: rpc error: code = Unavailable desc = error reading from server: EOF
== APP - wf2 == WARNING: 2024/06/20 14:49:25 received grpc error code Unavailable
== APP - wf2 == INFO: 2024/06/20 14:49:25 reconnecting work item listener stream
== APP - wf2 == INFO: 2024/06/20 14:49:25 successfully reconnected work item listener stream...
== APP - wf2 == ERROR: 2024/06/20 14:51:25 background processor received stream error: rpc error: code = Unavailable desc = error reading from server: EOF
== APP - wf2 == WARNING: 2024/06/20 14:51:25 received grpc error code Unavailable
== APP - wf2 == INFO: 2024/06/20 14:51:25 reconnecting work item listener stream
== APP - wf2 == INFO: 2024/06/20 14:51:37 successfully reconnected work item listener stream...