Closed kevin-bates closed 1 year ago
Good point @kiersten-stokes. I'm also going to remove the call to get_container_status()
within the namespace deletion block because it really reflects the state of the pod and not the namespace and was unconditionally invoked previously. As a result, we should hold off on this merge until I test this and push the commit.
This pull request attempts to address the delay seen when shutting down k8s-managed kernels. I believe the issue is because the pod (and namespace if not BYO namespace) is not terminated when the kernel process has stopped. Previously, the pod/namespace is not deleted until the cleanup phase, but these changes move the container resource termination to when the listener is shut down. This allows the polling to check that the "kernel" has terminated to leverage the container status, which, previously, because it had yet to occur, was causing the fallback termination behavior to occur.
I suspect this was not a problem in older versions of kubernetes because (just a hunch) when the container's process terminated the container terminated. However, I suspect k8s has "grown" enough where interaction with the resource manager is necessary and has introduced some delay.
There are several areas of race condition with starting/stopping pods from other applications, so I think this improves on those, but may not necessarily be the final solution.