jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
190 stars 134 forks source link

404 errors when combined with wrapspawner #129

Closed loadnabox closed 5 years ago

loadnabox commented 5 years ago

Apologies if this is the wrong place,struggling through the documentation and hit a brick wall, this was my best guess for help

Scenario:

Problem: I got batchspawner working on it's own first. I later added in lines for wrapspawner After adding wrapspawner options I now get the below errors:

Slurm output:

[I 2019-02-01 13:23:06.093 BatchSingleUserNotebookApp manager:46] [nb_conda_kernels] enabled, 0 kernels found
[I 2019-02-01 13:23:06.919 BatchSingleUserNotebookApp extension:168] JupyterLab extension loaded from /packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterlab
[I 2019-02-01 13:23:06.919 BatchSingleUserNotebookApp extension:169] JupyterLab application directory is /packages/7x/anaconda3/2018.12-jh/share/jupyter/lab
[W 2019-02-01 13:23:06.931 BatchSingleUserNotebookApp auth:303] Failed to check authorization: [404] Not Found
[W 2019-02-01 13:23:06.931 BatchSingleUserNotebookApp auth:304] {"status": 404, "message": "Not Found"}
Traceback (most recent call last):
  File "/packages/7x/anaconda3/2018.12-jh/bin/batchspawner-singleuser", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/packages/7x/build/batchspawner/0.9.0dev0/batchspawner/scripts/batchspawner-singleuser", line 6, in <module>
    main()
  File "/packages/7x/build/batchspawner/0.9.0dev0/batchspawner/batchspawner/singleuser.py", line 18, in main
    return BatchSingleUserNotebookApp.launch_instance(argv)
  File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyter_core/application.py", line 266, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/packages/7x/build/batchspawner/0.9.0dev0/batchspawner/batchspawner/singleuser.py", line 14, in start
    json={'port' : self.port})
  File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/services/auth.py", line 305, in _api_request
    raise HTTPError(500, "Failed to check authorization")
tornado.web.HTTPError: HTTP 500: Internal Server Error (Failed to check authorization)

From JupyterHub side:

[I 2019-02-01 13:23:01.411 JupyterHub batchspawner:243] Spawner submitted script:
    #!/bin/bash
    #SBATCH -q debug
    #SBATCH -p debug
    #SBATCH -t 0-12:00:00
    #SBATCH -N 1
    #SBATCH -n 1
    #SBATCH -o /home/USER/jupyterhub.%j.out
    #SBATCH -e /home/USER/jupyterhub.%j.err
    #SBATCH --export ALL
    ###SBATCH -w cg1-6
    source /etc/profile
    unset XDG_RUNTIME_DIR
    module load anaconda3/.2018.12-jh
    batchspawner-singleuser --ip="0.0.0.0" --notebook-dir="~"

[I 2019-02-01 13:23:01.483 JupyterHub batchspawner:246] Job submitted. cmd: sudo -E -u USER sbatch --parsable output: 859661
[D 2019-02-01 13:23:01.484 JupyterHub batchspawner:269] Spawner querying job: sudo -E -u USER squeue -h -j 859661 -o '%T %B'
[D 2019-02-01 13:23:01.512 JupyterHub batchspawner:369] Job 859661 still pending
[D 2019-02-01 13:23:02.013 JupyterHub batchspawner:269] Spawner querying job: sudo -E -u USER squeue -h -j 859661 -o '%T %B'
[D 2019-02-01 13:23:02.044 JupyterHub batchspawner:369] Job 859661 still pending
[D 2019-02-01 13:23:02.547 JupyterHub batchspawner:269] Spawner querying job: sudo -E -u USER squeue -h -j 859661 -o '%T %B'
[W 2019-02-01 13:23:07.008 JupyterHub log:158] 404 POST /hub/api/batchspawner (USER@10.126.16.15) 1.09ms
[W 2019-02-01 13:23:11.100 JupyterHub base:714] User USER is slow to start (timeout=10)
[I 2019-02-01 13:23:11.178 JupyterHub log:158] 302 POST /hub/spawn?next=%2Fhub%2Fuser%2FUSER%2F -> /hub/user/USER/ (USER@10.126.17.240) 10146.75ms
[D 2019-02-01 13:23:11.280 JupyterHub base:1008] Waiting for USER pending spawn
[I 2019-02-01 13:23:21.281 JupyterHub base:1012] Pending spawn for USER didn't finish in 10.0 seconds
[I 2019-02-01 13:23:21.281 JupyterHub base:1018] USER is pending spawn
[I 2019-02-01 13:23:21.289 JupyterHub log:158] 200 GET /hub/user/USER/ (USER@10.126.17.240) 10079.46ms
[D 2019-02-01 13:23:21.344 JupyterHub log:158] 200 GET /hub/static/css/style.min.css?v=dd1df30ccc6c4d3e9705d78012d25b57 (@10.126.17.240) 2.31ms
[W 2019-02-01 13:24:01.483 JupyterHub user:471] USER's server failed to start in 60 seconds, giving up
[D 2019-02-01 13:24:01.484 JupyterHub batchspawner:269] Spawner querying job: sudo -E -u USER squeue -h -j 859661 -o '%T %B'
[D 2019-02-01 13:24:01.552 JupyterHub user:578] Deleting oauth client jupyterhub-user-USER
[E 2019-02-01 13:24:01.685 JupyterHub gen:974] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/handlers/base.py:619> exception=TimeoutError('Timeout')> after timeout
    Traceback (most recent call last):
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/tornado/gen.py", line 970, in error_callback
        future.result()
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 626, in finish_user_spawn
        await spawn_future
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/user.py", line 489, in spawn
        raise e
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/user.py", line 409, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
    tornado.util.TimeoutError: Timeout

[E 2019-02-01 13:24:01.699 JupyterHub gen:974] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/handlers/base.py:619> exception=TimeoutError('Timeout')> after timeout
    Traceback (most recent call last):
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/tornado/gen.py", line 970, in error_callback
        future.result()
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/tornado/gen.py", line 970, in error_callback
        future.result()
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 626, in finish_user_spawn
        await spawn_future
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/user.py", line 489, in spawn
        raise e
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/user.py", line 409, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
    tornado.util.TimeoutError: Timeout

Config file:

## Load Batchspawner which enables intergration with SLURM
c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'
c.Spawner.http_timeout = 120

#------------------------------------------------------------------------------
# BatchSpawnerBase configuration
#    These are simply setting parameters used in the job script template below
#------------------------------------------------------------------------------
#c.BatchSpawnerBase.req_nprocs = '4'
#c.BatchSpawnerBase.req_queue = 'debug'
#c.BatchSpawnerBase.req_runtime = '0-8:00:00'
#c.BatchSpawnerBase.req_memory = '4gb'
c.Spawner.notebook_dir = '~'

#------------------------------------------------------------------------------
# SlurmSpawner configuration
#------------------------------------------------------------------------------
c.SlurmSpawner.batch_script = '''#!/bin/bash
#SBATCH -q {queue}
#SBATCH -p debug
#SBATCH -t {runtime}
#SBATCH -N 1
#SBATCH -n {nprocs}
#SBATCH -o {homedir}/jupyterhub.%j.out
#SBATCH -e {homedir}/jupyterhub.%j.err
#SBATCH --export ALL
###SBATCH -w cg1-6
source /etc/profile
unset XDG_RUNTIME_DIR
module load anaconda3/.2018.12-jh
{cmd}
'''

##  SSL Certificate locations
c.JupyterHub.ssl_cert = '/etc/pki/CA/certs/jupyter.crt'
c.JupyterHub.ssl_key = '/etc/pki/CA/private/jupyter.key'

##  URL for Jupyterhub to bind to
#c.JupyterHub.bind_url = 'https://jupyterhub.localdomain.com:443'

c.JupyterHub.ip = 'jupyterhub.localdomain.com'
c.JupyterHub.port = 443
c.JupyterHub.hub_ip = 'jupyterhub.localdomain.com'

##  Set authentication options
#  prevents JupyterHub from creating local users
c.LocalAuthenticator.create_system_users = False

#  Set admin users (admin users can run jobs and/or manage other users notebook servers)
c.Authenticator.admin_users = {'USER1', 'USER2'}

# Set the Jupyterhub log file location
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'

# Set the log level by value or name.
c.JupyterHub.log_level = 'DEBUG'

#  Spawner Profiles
c.ProfilesSpawner.profiles = [
  ( "Local Server", 'local', 'jupyterhub.spawner.LocalProcessSpawner', {'ip':'0.0.0.0'} ),
  ('clustername - 1 core, 4.5GB, 12 hours', 'clustername1c12h', 'batchspawner.SlurmSpawner',
    dict(req_nprocs='1', req_queue='debug', req_runtime='0-12:00:00')),
  ('clustername - 4 cores, 18GB, 8 Hours', 'clustername4c8h', 'batchspawner.SlurmSpawner',
    dict(req_nprocs='4', req_queue='debug', req_runtime='0-08:00:00')),
  ('clustername - 14 cores, 63GB, 4 hours', 'clustername14c4h', 'batchspawner.SlurmSpawner',
    dict(req_nprocs='14', req_queue='debug', req_runtime='0-04:00:00')),
  ('clustername - 28 cores, 128GB 1 hour', 'clustername28c1h', 'batchspawner.SlurmSpawner',
    dict(req_nprocs='28', req_queue='debug', req_runtime='0-01:00:00')),
]

TYIA, help is greatly appreciated

YFLOPS commented 5 years ago

I have the exact same issue. Any luck getting this figured out?

Does it have anything to do with the 10s timeout for the spawner? I'm not sure how it maintains the tree information internally.

[W 2019-02-01 13:23:11.100 JupyterHub base:714] User USER is slow to start (timeout=10)

YFLOPS commented 5 years ago

More infomation:

Same issue with master branch on both JupyterHub 0.9.4 and 0.8.1 with the master batchspawner master (0.9dev)

I got my system working using JupyterHub 0.8.1 and batchspawner tag 0.8.1.

Hoeze commented 5 years ago

I got the same problem. What would be the correct API call? Any ideas how to fix this?

Hoeze commented 5 years ago

I found the root cause of this: The problem is that the API handler never gets loaded since "batchspawner.api" never was imported.

The best solution to this is to add the following line in your jupyterhub_config.py:

c.JupyterHub.extra_handlers = [(r"/api/batchspawner", 'batchspawner.api.BatchSpawnerAPIHandler')]

See also https://github.com/jupyterhub/batchspawner/issues/126

rkdarst commented 5 years ago

I think this is updated in the current README now, with a solution of import batchspawrer, which is a bit more generic and works even if the API handling gets changed. Please let us know if more is needed.