Issue with distributed traces and multiple instances of the worker

microsoft / durabletask-go

The Durable Task Framework is a lightweight, embeddable engine for writing durable, fault-tolerant business logic (orchestrations) as ordinary code.

Apache License 2.0

178 stars 25 forks source link

I noticed a strange behavior when i run multiple instances of the worker (say 3) all pointing to the same database. Currently took the postgres implementation with some changes.

Here's the screenshot strange-traces

You can see that it contains several orchestration:SimpleOrchestration spans. This happens also to activities.

I also see several of these logs.

{"time":"2024-01-13T14:55:54.491837674+08:00","level":"ERROR","msg":"orchestration-processor: failed to complete work item: instance 'db1659b0-1528-4042-a500-0cb3822f2cad' no longer exists or was locked by a different worker"}
{"time":"2024-01-13T14:55:54.497473338+08:00","level":"ERROR","msg":"orchestration-processor: failed to abandon work item: lock on work-item was lost"}

I think this happens while the other workers are all processing the work items, while one of them has already transitioned or completed the work item.

microsoft / durabletask-go

Issue with distributed traces and multiple instances of the worker #59