jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
190 stars 134 forks source link

Spawn fails in conjunction with named servers #168

Closed Hoeze closed 4 years ago

Hoeze commented 5 years ago

Are Named servers supported by batchspawner?

Due to some reason, I cannot spawn any named servers. They always fail with the latest git commit of batchspawner.

logs from jupyterhub:


Nov 14 17:46:08 <jupyterhub_host> sudo[19063]:     root : TTY=unknown ; PWD=/ ; USER=<username> ; COMMAND=/bin/squeue -h -j 157299 -o %T %B
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: [W 2019-11-14 17:46:09.606 JupyterHub batchspawner:412] Notebook server job 157299 at <compute_node>:0 possibly failed to terminate
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: [E 2019-11-14 17:46:09.623 JupyterHub gen:974] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at 
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: Traceback (most recent call last):
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: File "/opt/modules/anaconda/lib/python3.7/site-packages/tornado/gen.py", line 970, in error_callback
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: future.result()
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: File "/opt/modules/anaconda/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 807, in finish_user_spawn
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: await spawn_future
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: File "/opt/modules/anaconda/lib/python3.7/site-packages/jupyterhub/user.py", line 642, in spawn
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: raise e
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: File "/opt/modules/anaconda/lib/python3.7/site-packages/jupyterhub/user.py", line 546, in spawn
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
Nov 14 17:46:09 <jupyterhub_host> jupyterhub[18312]: tornado.util.TimeoutError: Timeout

logs from the notebook job:

+ trap 'echo SIGTERM received' TERM
+ export XDG_RUNTIME_DIR=
+ XDG_RUNTIME_DIR=
+ export SHELL=/bin/bash
+ SHELL=/bin/bash
+ export BASH=/bin/bash
+ BASH=/bin/bash
+ export PATH=/opt/modules/anaconda/bin:/sbin:/bin:/usr/sbin:/usr/bin
+ PATH=/opt/modules/anaconda/bin:/sbin:/bin:/usr/sbin:/usr/bin
+ sort
+ env
BASH=/bin/bash
_=/bin/env
ENVIRONMENT=BATCH
HOME=/data/ag/home/<username>
HOSTNAME=<compute_node>
JPY_API_TOKEN=<token>
JUPYTERHUB_ACTIVITY_URL=http://<jupyterhub_host>:8687/jupyter/hub/api/users/<username>/activity
JUPYTERHUB_API_TOKEN=<token>
JUPYTERHUB_API_URL=http://<jupyterhub_host>:8687/jupyter/hub/api
JUPYTERHUB_BASE_URL=/jupyter/
JUPYTERHUB_CLIENT_ID=jupyterhub-user-<username>-vep_parser
JUPYTERHUB_HOST=
JUPYTERHUB_OAUTH_CALLBACK_URL=/jupyter/user/<username>/vep_parser/oauth_callback
JUPYTERHUB_SERVER_NAME=vep_parser
JUPYTERHUB_SERVICE_PREFIX=/jupyter/user/<username>/vep_parser/
JUPYTERHUB_USER=<username>
KRB5CCNAME=/tmp/krb5cc_5534_157299_CO31sM
LANG=en_US.UTF-8
OMP_NUM_THREADS=8
PATH=/opt/modules/anaconda/bin:/sbin:/bin:/usr/sbin:/usr/bin
PWD=/data/ag/home/<username>
SHELL=/bin/bash
SHLVL=1
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint
SLURM_CLUSTER_NAME=ag
SLURM_CPUS_ON_NODE=8
SLURM_CPUS_PER_TASK=8
SLURMD_NODENAME=<compute_node>
SLURM_EXPORT_ENV=PATH,LANG,JUPYTERHUB_API_TOKEN,JPY_API_TOKEN,JUPYTERHUB_CLIENT_ID,JUPYTERHUB_HOST,JUPYTERHUB_OAUTH_CALLBACK_URL,JUPYTERHUB_USER,JUPYTERHUB_SERVER_NAME,JUPYTERHUB_API_URL,JUPYTERHUB_ACTIVITY_URL,JUPYTERHUB_BASE_URL,JUPYTERHUB_SERVICE_PREFIX,USER,HOME,SHELL
SLURM_GET_USER_ENV=1
SLURM_GTIDS=0
SLURM_JOB_CPUS_PER_NODE=8
SLURM_JOB_GID=501918
SLURM_JOB_ID=157299
SLURM_JOBID=157299
SLURM_JOB_NAME=spawner-jupyterhub
SLURM_JOB_NODELIST=<compute_node>
SLURM_JOB_NUM_NODES=1
SLURM_JOB_PARTITION=slurm-ag
SLURM_JOB_QOS=normal
SLURM_JOB_UID=5534
SLURM_JOB_USER=<username>
SLURM_LOCALID=0
SLURM_MEM_PER_NODE=16000
SLURM_NNODES=1
SLURM_NODE_ALIASES=(null)
SLURM_NODEID=0
SLURM_NODELIST=<compute_node>
SLURM_PRIO_PROCESS=0
SLURM_PROCID=0
SLURM_SPANK_AUKS=done
SLURM_SUBMIT_DIR=/
SLURM_SUBMIT_HOST=<jupyterhub_host>
SLURM_TASK_PID=8177
SLURM_TASKS_PER_NODE=1
SLURM_TOPOLOGY_ADDR=<compute_node>
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_WORKING_CLUSTER=ag:192.168.23.244:6817:8448
TEMP_DIR=/scratch/tmp/<username>
TEMP=/scratch/tmp/<username>
TMPDIR=/scratch/tmp/<username>
TMP=/scratch/tmp/<username>
USER=<username>
XDG_RUNTIME_DIR=
+ which jupyterhub-singleuser
/opt/modules/anaconda/bin/jupyterhub-singleuser
+ batchspawner-singleuser jupyterhub-singleuser --ip=0.0.0.0 --NotebookApp.default_url=/lab
[I 2019-11-14 17:45:01.392 SingleUserNotebookApp manager:46] [nb_conda_kernels] enabled, 66 kernels found
[I 2019-11-14 17:45:03.137 SingleUserNotebookApp extension:155] JupyterLab extension loaded from /opt/modules/anaconda/lib/python3.7/site-packages/jupyterlab
[I 2019-11-14 17:45:03.137 SingleUserNotebookApp extension:156] JupyterLab application directory is /opt/modules/anaconda/share/jupyter/lab
[I 2019-11-14 17:45:03.269 SingleUserNotebookApp __init__:31] [Jupytext Server Extension] Deriving a JupytextContentsManager from LargeFileManager
[I 2019-11-14 17:45:03.277 SingleUserNotebookApp singleuser:561] Starting jupyterhub-singleuser server version 1.0.0
[I 2019-11-14 17:45:03.302 SingleUserNotebookApp notebookapp:1772] Serving notebooks from local directory: /data/ag/home/<username>
[I 2019-11-14 17:45:03.303 SingleUserNotebookApp notebookapp:1772] The Jupyter Notebook is running at:
[I 2019-11-14 17:45:03.303 SingleUserNotebookApp notebookapp:1772] http://(<compute_node> or 127.0.0.1):47769/jupyter/user/<username>/vep_parser/
[I 2019-11-14 17:45:03.303 SingleUserNotebookApp notebookapp:1773] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2019-11-14 17:45:03.320 SingleUserNotebookApp singleuser:542] Updating Hub with activity every 300 seconds
slurmstepd: error: *** JOB 157299 ON <compute_node> CANCELLED AT 2019-11-14T17:45:57 ***
rcthomas commented 5 years ago

Almost. See #167

Hoeze commented 4 years ago

Solved now