posit-dev / positron

Positron, a next-generation data science IDE
https://positron.posit.co
Other
2.82k stars 90 forks source link

Ensure session exit counts as shutdown #5343

Closed jmcphers closed 1 week ago

jmcphers commented 1 week ago

This is a speculative change that attempts to address an issue seen sporadically in CI. Specifically, one or both of these:

https://github.com/posit-dev/positron/issues/5285 https://github.com/posit-dev/positron/issues/5334

I have not been able to reproduce either locally. However, checking the Playwright traces, it looks as though some parts of the system know that the kernel has shut down, while others are still waiting. For example, you can see here that the console knows that the shutdown is finished, but another part of the system thinks it's still shutting down.

image

This is probably related to some state management warts around runtime exits, which have two different components: an Exited status event that merely marks the runtime as Exited, and an onDidEndSession event that carries additional metadata around the session exit, such as its exit code, any errors or messages associated with the exit, etc. These events are currently not guaranteed to arrive in any particular order, and some parts of the system only pay attention to one of them depending on whether they need the additional metadata.

My theory for these specific bugs is that we are getting into a race condition w/r/t the ordering of these events when waiting for shutdown to finish; the speculative fix is to ensure that -- for the purposes of waiting for a session shutdown to complete -- a onDidEndSession is as good as an Exited status.

It would be good to refactor the way exits are handled in the system so that it is not necessary to check both events here, but that's a larger change than I want to make speculatively.

Here's a smoke test run that shows the tests mentioned in the above two issues are passing with this change: https://github.com/posit-dev/positron/actions/runs/11807687400/job/32896351530 (not a guarantee that the change is correct since the failures are intermittent)