Closed spalkovits closed 5 years ago
Hi,
Looking at this I don't see anything immediately wrong or any immediate cause. I do see this:
In the hub log:
[C 2019-09-20 11:17:26.777 JupyterHub app:2448] Received signal SIGINT, initiating shutdown... ... �[32m[I 2019-09-20 11:17:26.817 JupyterHub batchspawner:342]�(B�[m Stopping server job 15
Then in singleuser-server log:
slurmstepd-bionic: error: JOB 15 ON bionic CANCELLED AT 2019-09-20T11:17:26
So I guess the reason the server is stopping is that you interrupted the job, which makes the hub cancel the starting server. Did you do this yourself because the server didn't start? If not, that's the first place to look.
If the single-user server isn't starting fast enough...
in my own server logs, I see this:
...JupyterHub log:174] 200 POST /hub/api/batchspawner (darstr1@10.10.100.65) 11.58ms
which is how I know it communicates with the hub. I don't see the single-user server communicating with the hub, so could that be it? Does hub_connect_url work to connect back?
One thing I do for debugging is put env
in the batch script, so I can check all the variables that get passed and see if something is wrong there. some of the JUPYTERHUB_*
ones might give you a clue if it can connect back or not, or if something else isn't getting through or is wrong.
Wait... which version are you using, latest master or last release? Last release is quite old and might have some some issues with latest hub, it's been so long I don't remember. Everything in my previous message only applies to the latest master.
Hello, yes the Hub was intentionally shut down after a while. The versions I took where the following. For the Hub I used the latest release via conda from the the conda-forge channel and for the spawner the latest releases via pip. I think I will first try the hub_connect_url and see what comes out. If anyone has more suggestions I would be happy to try them. Thanks @rkdarst!
Hello,
setting the hub_connect_url it does not help. After a while I get a timeout. It seems that the server is not reachable? Here is the log:
[32m[I 2019-09-22 14:56:30.803 JupyterHub app:2120](B[m Using Authenticator: jupyterhub.auth.PAMAuthenticator-1.0.0
[32m[I 2019-09-22 14:56:30.804 JupyterHub app:2120](B[m Using Spawner: wrapspawner.wrapspawner.ProfilesSpawner
[32m[I 2019-09-22 14:56:30.810 JupyterHub app:1257](B[m Loading cookie_secret from /home/palkovits/jupyterhub_cookie_secret
[32m[I 2019-09-22 14:56:30.832 JupyterHub proxy:460](B[m Generating new CONFIGPROXY_AUTH_TOKEN
[33m[W 2019-09-22 14:56:30.834 JupyterHub app:1532](B[m No admin users, admin interface will be unavailable.
[33m[W 2019-09-22 14:56:30.835 JupyterHub app:1534](B[m Add any administrative users to `c.Authenticator.admin_users` in config.
[32m[I 2019-09-22 14:56:30.835 JupyterHub app:1563](B[m Not using whitelist. Any authenticated user will be allowed.
[32m[I 2019-09-22 14:56:30.883 JupyterHub app:2337](B[m Hub API listening on http://127.0.0.1:8081/hub/
[33m[W 2019-09-22 14:56:30.884 JupyterHub proxy:642](B[m Running JupyterHub without SSL. I hope there is SSL termination happening somewhere else...
[32m[I 2019-09-22 14:56:30.885 JupyterHub proxy:645](B[m Starting proxy @ http://:8000
[32m[I 2019-09-22 14:56:31.504 JupyterHub proxy:319](B[m Checking routes
[32m[I 2019-09-22 14:56:31.505 JupyterHub proxy:399](B[m Adding default route for Hub: / => http://127.0.0.1:8081
[32m[I 2019-09-22 14:56:31.513 JupyterHub app:2422](B[m JupyterHub is now running at http://:8000
[32m[I 2019-09-22 14:56:34.098 JupyterHub log:174](B[m 200 GET /hub/home (palkovits@::ffff:10.0.2.2) 86.91ms
[32m[I 2019-09-22 14:56:35.688 JupyterHub log:174](B[m 200 GET /hub/spawn/palkovits (palkovits@::ffff:10.0.2.2) 24.19ms
[32m[I 2019-09-22 14:56:38.807 JupyterHub batchspawner:188](B[m Spawner submitting job using sudo -E -u palkovits sbatch --parsable
[32m[I 2019-09-22 14:56:38.807 JupyterHub batchspawner:189](B[m Spawner submitted script:
#!/bin/bash
#SBATCH --partition=compute
#SBATCH --time=8:00:00
#SBATCH --output=/home/palkovits/jupyterhub_slurmspawner_%j.log
#SBATCH --job-name=jupyterhub-spawner
#SBATCH --cpus-per-task=1
#SBATCH --workdir=/home/palkovits
#SBATCH --mem=800
export PATH=/opt/conda/bin:$PATH
jupyterhub-singleuser --port=53169
[32m[I 2019-09-22 14:56:38.834 JupyterHub batchspawner:192](B[m Job submitted. cmd: sudo -E -u palkovits sbatch --parsable output: 17
[32m[I 2019-09-22 14:56:39.897 JupyterHub batchspawner:330](B[m Notebook server job 17 started at bionic:53169
[32m[I 2019-09-22 14:56:40.203 JupyterHub log:174](B[m 200 GET /hub/api (@127.0.0.1) 0.94ms
[32m[I 2019-09-22 14:56:40.235 JupyterHub log:174](B[m 200 POST /hub/api/users/palkovits/activity (palkovits@127.0.0.1) 24.62ms
[33m[W 2019-09-22 14:56:48.751 JupyterHub base:932](B[m User palkovits is slow to become responsive (timeout=10)
[32m[I 2019-09-22 14:56:48.784 JupyterHub log:174](B[m 302 POST /hub/spawn/palkovits -> /hub/spawn-pending/palkovits (palkovits@::ffff:10.0.2.2) 10068.71ms
[32m[I 2019-09-22 14:56:48.798 JupyterHub pages:303](B[m palkovits is pending spawn
[32m[I 2019-09-22 14:56:48.808 JupyterHub log:174](B[m 200 GET /hub/spawn-pending/palkovits (palkovits@::ffff:10.0.2.2) 18.55ms
[32m[I 2019-09-22 15:01:31.521 JupyterHub proxy:319](B[m Checking routes
[32m[I 2019-09-22 15:01:40.933 JupyterHub log:174](B[m 200 POST /hub/api/users/palkovits/activity (palkovits@127.0.0.1) 14.01ms
[32m[I 2019-09-22 15:06:11.162 JupyterHub log:174](B[m 200 POST /hub/api/users/palkovits/activity (palkovits@127.0.0.1) 17.56ms
[32m[I 2019-09-22 15:06:31.515 JupyterHub proxy:319](B[m Checking routes
[32m[I 2019-09-22 15:11:13.126 JupyterHub log:174](B[m 200 POST /hub/api/users/palkovits/activity (palkovits@127.0.0.1) 17.98ms
[32m[I 2019-09-22 15:11:31.518 JupyterHub proxy:319](B[m Checking routes
[32m[I 2019-09-22 15:15:51.924 JupyterHub log:174](B[m 200 POST /hub/api/users/palkovits/activity (palkovits@127.0.0.1) 16.99ms
[33m[W 2019-09-22 15:16:28.000 JupyterHub user:678](B[m palkovits's server never showed up at http://bionic:53169/user/palkovits/ after 1200 seconds. Giving up
[32m[I 2019-09-22 15:16:28.029 JupyterHub batchspawner:342](B[m Stopping server job 17
[32m[I 2019-09-22 15:16:28.030 JupyterHub batchspawner:233](B[m Cancelling job 17: sudo -E -u palkovits scancel 17
[31m[E 2019-09-22 15:16:30.144 JupyterHub gen:599](B[m Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /opt/conda/lib/python3.7/site-packages/jupyterhub/handlers/base.py:800> exception=TimeoutError("Server at http://bionic:53169/user/palkovits/ didn't respond in 1200 seconds")> after timeout
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 593, in error_callback
future.result()
File "/opt/conda/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 807, in finish_user_spawn
await spawn_future
File "/opt/conda/lib/python3.7/site-packages/jupyterhub/user.py", line 654, in spawn
await self._wait_up(spawner)
File "/opt/conda/lib/python3.7/site-packages/jupyterhub/user.py", line 701, in _wait_up
raise e
File "/opt/conda/lib/python3.7/site-packages/jupyterhub/user.py", line 669, in _wait_up
http=True, timeout=spawner.http_timeout, ssl_context=ssl_context
File "/opt/conda/lib/python3.7/site-packages/jupyterhub/utils.py", line 234, in wait_for_http_server
timeout=timeout,
File "/opt/conda/lib/python3.7/site-packages/jupyterhub/utils.py", line 177, in exponential_backoff
raise TimeoutError(fail_message)
TimeoutError: Server at http://bionic:53169/user/palkovits/ didn't respond in 1200 seconds
[32m[I 2019-09-22 15:16:30.165 JupyterHub log:174](B[m 200 GET /hub/api/users/palkovits/server/progress (palkovits@::ffff:10.0.2.2) 1181208.19ms
[32m[I 2019-09-22 15:16:31.519 JupyterHub proxy:319](B[m Checking routes
[C 2019-09-22 15:18:16.413 JupyterHub app:2448] Received signal SIGINT, initiating shutdown...
[32m[I 2019-09-22 15:18:16.417 JupyterHub app:2155](B[m Cleaning up single-user servers...
[32m[I 2019-09-22 15:18:16.420 JupyterHub proxy:705](B[m Cleaning up proxy[2032]...
[32m[I 2019-09-22 15:18:16.426 JupyterHub app:2187](B[m ...done
Regards...
Just wondering how you were able to solve it... always useful for the future.
Sorry I just hit the wrong button. My fault. Its still not working.
Try the latest batchspawner + wrapspawner from git. At least latest wrapspawner is needed. The last batchspawner release is probably out of date too, enough where I wouldn't trust it and don't remember what may have changed. Releases don't effectively happen yet.
The new batchspawner should use a singleuser command "batchspawner-singleuser" instead of "jupyterhub-singleuser" (this should be automatic). Just a way to see if newer batchspawner is doing what it should, but it didn't exist at the time of the last release.
Hello, the latest version from git worked. My VM behaves now like it should. On my production system I had to add the solution from this issue https://github.com/jupyterhub/jupyterhub/issues/774#issuecomment-249420931 Thank you for now, I will close this for the time being.
Hello,
I am reading the issues here now for quite some time but do not find any solution to my problem.
I want to use the SlurmSpawner with a JupyterHub. For testing purposes I installed a clean Ubuntu 18.04 in VirtualBox. The JupyterHub (and the Notebook and Jupyterlab) are installed via conda (Miniconda installed to /opt/conda) form the conda-forge channel. BatchSpawner and ProfileSpawner are the installed via pip from /opt/conda/bin.
Slurm seems to work properly. At least the jobs get queued and run. My jupyter_config.py looks like this:
The slurm.conf like this
in the JupyterHubs log I find nothing:
Neither do I in the log of the job:
I run the JuypterHub right now just interactively and not as a service
Any feedback would be appreciated.
Regards!