You had an issue with JupyterHub shutting down when a user server was stopped. This was because JupyterHub received a SIGTERM signal even though why that happened still isn't clear to me. The SIGTERM signal followed calling .terminate() on a process created with the Python library multiprocessing.
I think I've found a one-line fix to this issue: to use .kill() instead of .terminate().
I don't yet understand why JupyterHub receives a SIGTERM in the first place by calling .terminate(), and I think that is the root cause of the issue. With this fix, I believe we avoid because we stop sending SIGTERM to our separate processes. I also believe that we don't run into a similar problem with SIGKILL because I don't think it is passed to the process for interpretation as it is associated with forceful termination of the process rather than a "please terminate" request.
Using .kill(), the redirection webserver for the user server is killed while JupyterHub keeps running, so this issue seem resolved.
During debugging I used this jupyterhub config file to reproduce the issue with less overhead.
"""
This jupyterhub config file have been used to debug the starting and stopping of
separate processes for each user server. These processes were running a
primitive webserver responding with 302 redirects to the actual server.
There have been an issue where calling terminate() on the started process as
part of stopping the spawned user server led to JupyterHub receiving a SIGTERM
and shutting down. Why JupyterHub stops wasn't clear, but the goal was to just
stop the process running a primitive webserver responding with 302 redirects.
This config contains a dummy setup with logic stripped from redirector.py to
test a user server start/stop cycle, where stop has led to the unwanted
termination of JupyterHub itself.
The test cycle I've used was like this.
# start jupyterhub
jupyterhub --config test_redirector_jupyterhub_config.py
# visit jupyterhub and start a server
# (ignore redirection loop)
# visit jupyterhub again and stop the server
# observe jupyterhub receiving a SIGTERM and no longer responding
# cleanup after crash
pkill -9 jupyterhub
While a lot of unknown remains, I concluded that using .kill() instead of
.terminate() on the created process avoided JupyterHub observing a SIGTERM
signal and shutting down, which was the unwanted behavior.
"""
import asyncio
import http.server
import multiprocessing
import socketserver
from jupyterhub.spawner import Spawner
class DummyRedirectSpawner(Spawner):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.redirect_server = None
async def start(self):
url = "https://google.se"
route = RedirectServer.get_existing_redirect(url)
if route:
self.log.info("Existing route to compute instance found.")
else:
self.log.info("Creating route to compute instance.")
self.redirect_server = RedirectServer(url)
self.redirect_server.start()
await asyncio.sleep(1)
route = self.redirect_server.route
self.log.info("Route to compute instance created.")
return route
async def stop(self):
if self.redirect_server:
self.log.info(f"Stopping the redirect server route: {self.redirect_server.route}.")
self.redirect_server.stop()
self.redirect_server = None
async def poll(self):
if self.redirect_server:
return None
else:
return 0
def redirect_handler_factory(url):
class RedirectHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
self.send_response(302)
self.send_header('Location', url)
self.end_headers()
return RedirectHandler
def _create_server(url, port):
port = port
with socketserver.TCPServer(("", port), redirect_handler_factory(url)) as httpd:
print("serving at port", port)
httpd.serve_forever()
class RedirectServer:
_start_port = 9001
_redirects = {}
@classmethod
def _get_free_port(cls):
port = cls._start_port
taken = list(cls._redirects.values())
while port in taken:
port += 1
return port
@classmethod
def get_existing_redirect(cls, url):
port = cls._redirects.get(url)
return ("0.0.0.0", port) if port else None
def __init__(self, redirect_to_url):
self.url = redirect_to_url
super().__init__()
def start(self):
self.port = RedirectServer._get_free_port()
self.server_process = multiprocessing.Process(
target=_create_server,
name=f"redirector-port-{self.port}",
args=[self.url, self.port],
daemon=True,
)
self.server_process.start()
RedirectServer._redirects[self.url] = self.port
def stop(self):
del RedirectServer._redirects[self.url]
# Using .kill() instead of .terminate(), as the latter apparently leads
# to JupyterHub observing a SIGTERM signal which causes it to shut down.
self.server_process.kill()
@property
def route(self):
return ("0.0.0.0", self.port)
c.JupyterHub.authenticator_class = "dummy"
c.JupyterHub.spawner_class = DummyRedirectSpawner
c.JupyterHub.redirect_to_server = False
Hi @pierocor, @nbarlowATI, and @tam203!
You had an issue with JupyterHub shutting down when a user server was stopped. This was because JupyterHub received a SIGTERM signal even though why that happened still isn't clear to me. The SIGTERM signal followed calling
.terminate()
on a process created with the Python librarymultiprocessing
.I think I've found a one-line fix to this issue: to use
.kill()
instead of.terminate()
.I don't yet understand why JupyterHub receives a SIGTERM in the first place by calling
.terminate()
, and I think that is the root cause of the issue. With this fix, I believe we avoid because we stop sending SIGTERM to our separate processes. I also believe that we don't run into a similar problem with SIGKILL because I don't think it is passed to the process for interpretation as it is associated with forceful termination of the process rather than a "please terminate" request.Using
.kill()
, the redirection webserver for the user server is killed while JupyterHub keeps running, so this issue seem resolved.Reference
Here are some of the notes from our last chat.
test_redirector_jupyterhub_config.py
During debugging I used this jupyterhub config file to reproduce the issue with less overhead.