jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
615 stars 221 forks source link

Fix shutdown of CRD process proxy (and subclasses) #1234

Closed kevin-bates closed 1 year ago

kevin-bates commented 1 year ago

@lresende alerted me to an issue when restarting kernels managed by the SparkOperatorProcessProxy. Looking into the issue, there were several issues encountered that were preventing the graceful termination and, thus, successful restarts.

While troubleshooting the issue and given that regular KubernetesProcessProxy objects behaved properly, it was found that the override methods in both CustomResourceProcessProxy and SparkOperatorProcessProxy were essentially unnecessary with the exception of the API used to delete the CRD.

This PR refactors some of the KubernetesProcessProxy class hierarchy methods surrounding container status and termination.

While looking into this, it was also discovered that the EG's PATH was not available to the launch logic that takes place within the environment of the to-be-launched kernel. This was another side-effect of #1164 that was similarly addressed in #1230. In this case, we leverage the new "extra envs" addition recently added to the helm files via #1231 to inherit EG's PATH.

In addition, I noticed we weren't setting labels on the spark executor objects as we do in our other spark kernels, so I updated the spark operator launch script to set the same labels so administrators can include those objects in any "application analysis" that may be performed.

lresende commented 1 year ago

I have been testing this and all looks good to me. Thank you @kevin-bates for all the help here.