jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
187 stars 132 forks source link

JupyterHub checks wrong server url after restart #236

Closed tryauuum closed 2 years ago

tryauuum commented 2 years ago

Bug description

I use a subclass of BatchSpawner (created for Univa Grid Engine), JupyterHub 2.0.0. If I restart JupyterHub it uses wrong url to check if user's server is still running. It concludes that server is not running and deletes cluster job.

Expected behaviour

Cluster jobs should survive JupyterHub restart

Actual behaviour

logs contain lines like these (note port 0) Verifying that tryauuum is running at http://my-hostname:0/user/tryauuum/ Of course server is not reachable via this url. After some time JupyterHub runs qdel to delete cluster job.


The bug is probably with my custom class

I see that after restart there still is a correct port and IP in sqlite database. I want to ask, where exactly in code server IP and port are read from servers table of sqlite database? do I have to implement it manually? All I see in code in terms of restoring IP and port from database is this iteration over list of spawners here https://github.com/jupyterhub/jupyterhub/blob/aa0ce1c88a8e34da1779cb13d3f36f81a6a24df7/jupyterhub/app.py#L2517-L2535

welcome[bot] commented 2 years ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

tryauuum commented 2 years ago

I've finally managed to find the place where the server IP and port are set to incorrect values (lines are from jupyterhub/spawner.py)

The logic is following

(Pdb) l .
836             )
837             env['JUPYTERHUB_BASE_URL'] = self.hub.base_url[:-4]
838
839             if self.server:
840                 base_url = self.server.base_url
841  ->             if self.ip or self.port:
842                     self.server.ip = self.ip
843                     self.server.port = self.port
844                 env['JUPYTERHUB_SERVICE_PREFIX'] = self.server.base_url
845             else:
846                 # this should only occur in mock/testing scenarios
(Pdb) print('listen_ip:', self.ip, 'listen_port:', self.port)
listen_ip: 0.0.0.0 listen_port: 0
(Pdb) any([self.ip, self.port])
True
(Pdb) print(self.server.ip, self.server.port)
compute-225.example.com 38765
(Pdb)

This change was introduced in https://github.com/jupyterhub/jupyterhub/pull/3381

cmd-ntrf commented 2 years ago

I have just experienced the exact same thing with SlurmSpawner, and based on your second comment it looks like it's a bug in JupyterHub. We should probably create an issue in jupyterhub.

manics commented 2 years ago

Thanks for digging in to this I can transfer this issue back to JupyterHub if it's a bug instead of creating a new one- is it definitely a bug though? What do you think the behaviour should be? Since get_env() returns JUPYTERHUB_SERVICE_PREFIX does BatchSpawner actually need a separate version of get_env() that it can call before calling qstat?

manics commented 2 years ago

@minrk Would you mind sharing your thoughts on where this should be fixed?

tryauuum commented 2 years ago

I think get_env() shouldn't modify internal state at all. It seems wrong that function created for returning a dictionary of environmental variables changes internal state of a class instance.

I didn't understand your comment about JUPYTERHUB_SERVICE_PREFIX, the function will be able to set it even if lines that change self.server.ip and self.server.port are removed.

Could you please transfer the issue back to JupyterHub, I will try to write a fix. Hopefully I will understand how the testing suite works.

minrk commented 2 years ago

I think you're right, get_env probably shouldn't modify the state. It also shouldn't be called except during Spawner.start(), so I'm not entirely sure how/why this is coming up, but that much is easy to fix.

minrk commented 2 years ago

Should be fixed by https://github.com/jupyterhub/jupyterhub/pull/3859