temporalio / sdk-python

Temporal Python SDK
MIT License
473 stars 77 forks source link

Safe Eviction #499

Closed cretz closed 8 months ago

cretz commented 8 months ago

What was changed

See #494. Today, if you evict a workflow that has incomplete coroutines, we simply delete it from the workflow map. This means Python will garbage collect these coroutines at some point in the future. Garbage collection of coroutines involves throwing a GeneratorExit within them, which can cause the coroutine to wake up on any thread to handle this GeneratorExit. Therefore, finally may execute in another thread which may be running another workflow at the time. This is very bad and despite all attempts, we cannot reasonably intercept Python's coroutine garbage collection or the GeneratorExit behavior here.

So we have refactored the eviction process to cancel all outstanding tasks and to ignore or raise during any side effects attempted (e.g. commands). This is similar to other SDKs that have to tear down coroutines. However there are cases where a user may have done something invalid and the cancel may not complete the coroutine. This will log an error, hang the eviction, and will forever use up that task slot. It will also prevent worker shutdown. We can discuss ways to improve this if needed.

This supersedes #325/#341 which we previously thought was good enough to handle GeneratorExit.

What changed:

Checklist

  1. Closes #494