microsoft / durabletask-go

The Durable Task Framework is a lightweight, embeddable engine for writing durable, fault-tolerant business logic (orchestrations) as ordinary code.
Apache License 2.0
198 stars 25 forks source link

Fix server stuck issue when test timeout exception #61

Closed kaibocai closed 3 months ago

kaibocai commented 8 months ago

Resolve https://github.com/microsoft/durabletask-go/issues/58.

Thanks @ItalyPaleAle for the fix!

ItalyPaleAle commented 6 months ago

@cgillum are you good with this PR? I think it fixes some bugs that could make dtf-go hang

cgillum commented 6 months ago

@ItalyPaleAle I'm quite uncomfortable with all the changes related to shutdown, which aren't even relevant for the original issue. I need more time to understand all the implications. If we can reduce the scope of this PR to just fixing the hangs, then I'd be okay with merging it.

ItalyPaleAle commented 6 months ago

I had to go back and look at the PR again since it's been a while. I don't remember the reason for changes behind the shutdown, but looking at it, it seems there's code that closes channels that would otherwise be hanging, so it seems to prevent leaking goroutines

yaron2 commented 3 months ago

We've tested this code internally and can confirm it improves reliability greatly.

yaron2 commented 3 months ago

cc @cgillum @ItalyPaleAle

famarting commented 3 months ago

looks good to me!

Making the calls ExecuteOrchestrator and ExecuteActivity fail when the GetWorkItems function exits increases reliability greatly. Because it allows to signal the backend of an error so it can call AbandonActivityWorkItem, and that way depending on the backend implementation the activity will be retried as soon as a the GetWorkItems stream is recreated.