sematic-ai / sematic

An open-source ML pipeline development platform
Other
972 stars 59 forks source link

Make runner reentrant #1037

Closed augray closed 1 year ago

augray commented 1 year ago

The cloud runner is sometimes restarted unexpectedly by K8s. Prior to this PR, this was handled by starting an entirely new pipeline run, with a new id, starting where the old one left off. This PR changes the behavior such that the runner is instead able to re-create its internal state and continue from where it left off. This (unlike the previous behavior) should be transparent to end-users.

Testing

Disabled the signal handlers in the StateMachineRunner, so it wouldn't interpret kubectl delete pod ... as cancellations, then performed the following tests, using kubectl delete pod on the runner pod to emulate evictions: