jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
623 stars 222 forks source link

EG is not terminating correctly via SIGTERM #701

Closed vikasgarg1996 closed 5 years ago

vikasgarg1996 commented 5 years ago

I want to add functionality of handling "SIGKILL" signal in jupyter enterprise gateway. I am thinking of cleaning up and then killing the current process in "SIGKILL" signal.

I wanted to ask whether JEG creates any child processes in any scenarios so I need to kill them as well.

kevin-bates commented 5 years ago

Hi. SIGKILL cannot be caught. EG already does the "right thing" (i.e., shutdown any running kernels) with SIGTERM, why is that not sufficient?

vikasgarg1996 commented 5 years ago

Yes. It does shutdown any running kernels. But the enterprise gateway process is not being killed. Is anything happening while handling SIGTERM signal which is stopping the process being killed?? Or is it supposed to be like that??

kevin-bates commented 5 years ago

@vikasgarg1996 - I think I might be seeing the same thing you're seeing. In python 2, the gateway server shuts down. I'm not certain if this is related to python 3 or a conda env thing, but I suspect we should be calling self.stop in the SIGTERM handler. This will stop both the io_loop and the http server.

I also believe there might be a side affect from PR #686 here. I think I'm seeing the (python-only) launchers orphaned when using this form of termination. When terminating kernels via the client api, all things are okay. As a result, I'd like to change the heading on this issue - it's more about termination than SIGKILL.

Before proceeding, could you please describe your env relative to python and EG versions, conda, etc.?

vikasgarg1996 commented 5 years ago

I am using python3 conda environment(conda version 4.6.14) with JEG version 1.2.0. I have checked with changing the call to "self.stop()". But it's still doing the same thing and gateway server is not shutting down.

Sure you can change the heading.

kevin-bates commented 5 years ago

Thanks for the information.

I see notebook is calling the following in their sigterm handler instead of calling self.io_loop.stop() directly. Would you mind trying the same? I ran with this yesterday and it seemed to help (but so did calling self.stop() directly).

self.io_loop.add_callback_from_signal(self.io_loop.stop)

Which python version are you using?

There has been some history with signal management within conda but the issues I found were more in the 4.3 timeframe.

kevin-bates commented 5 years ago

Here's the PR in which Notebook introduced the change to use add_callback_from_signal: https://github.com/jupyter/notebook/pull/2752. I suspect the behavior changed with python 3 since EG in py2 will terminate and both envs use tornado 5.1.

vikasgarg1996 commented 5 years ago

Thanks. "self.io_loop.add_callback_from_signal(self.io_loop.stop)" is working properly. It is killing any running kernels and shutting down enterprise gateway server as well.

As in new JEG versions, you have already removed support of python2.. I have not checked with python2. .

I have created a pull request for this. Can you please look into this?? https://github.com/jupyter/enterprise_gateway/pull/703