microsoft / durabletask-go

The Durable Task Framework is a lightweight, embeddable engine for writing durable, fault-tolerant business logic (orchestrations) as ordinary code.
Apache License 2.0
198 stars 25 forks source link

Support heartbeats from app code for work-item renewal #34

Open cgillum opened 11 months ago

cgillum commented 11 months ago

Problem

Each of the supported backends currently has a lock timeout which is used to detect when a remote app worker may have crash or otherwise become unresponsive. However, the simple timeout mechanism doesn't take into account whether the app has gone away or whether the task is simply taking a long time to complete.

For example, if the lock expiration timeout is 1 minute, but a particular activity task takes 5 minutes to complete, then the lock on that work-item will expire before the activity completes and the activity may be rescheduled unnecessarily.

Proposal - heartbeats

To solve this problem, we propose adding a "heartbeat" callback that activity implementations can use to signal that they're still actively processing a particular work-item. This would be a gRPC API that SDKs can call periodically to renew the lock expiration time for an activity work-item.

As a secondary feature, the heartbeat could be used to get the status of the parent orchestration. If the parent orchestration has been terminated, the activity could then choose to cooperatively terminate itself (details TBD on how this would work for each language SDK).