jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
190 stars 134 forks source link

slurm full path needed #216

Closed hoba87 closed 1 year ago

hoba87 commented 3 years ago

Bug description

Without specifying the full path to the slurm executables like sbatch in batchspawner.py, these are not found. I have added the slurm bin directory to the jupyterhub service environment.

Expected behaviour

JupyterHub uses its environment to find slurm executables.

Actual behaviour

Full paths to slurm executables are needed

How to reproduce

  1. Go to JupyterHub - Single User instance -> Control Panel -> Start my server / new server (using slurm)
  2. See error "sbatch not found"

Your personal set up

CentOS 8.2 python 3.7.9 jupyterhub 1.4.0 batchspawner 1.1.1

Configuration

``` [Unit] Description=Jupyterhub After=network-online.target [Service] User=root Environment="PATH=/opt/openmpi/openmpi-4.0.5/bin:/opt/pmix/pmix-3.1.5/bin:/opt/python/python-3.7.9/bin:/usr/share/Modules/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/slurm/slurm-20.02.5/bin:" Environment="LD_LIBRARY_PATH=/opt/openmpi/openmpi-4.0.5/lib:/opt/pmix/pmix-3.1.5/lib:/opt/python/python-3.7.9/lib:/usr/lib64:" Environment="PYTHONPATH=/opt/python/python-3.7.9/lib:" Environment="OPENBLAS_NUM_THREADS=1" Environment="MKL_NUM_THREADS=1" Environment="OMP_NUM_THREADS=1" ExecStart=/opt/python/python-3.7.9/bin/jupyterhub -f /etc/jupyterhub/jupyterhub_config.py WorkingDirectory=/etc/jupyterhub [Install] WantedBy=multi-user.target ```
c.JupyterHub.active_server_limit = 10 
c.JupyterHub.allow_named_servers = True

c.JupyterHub.hub_ip = '192.168.0.254'
import batchspawner

c.Spawner.http_timeout = 120

c.BatchSpawnerBase.req_nprocs = '1'
c.BatchSpawnerBase.req_queue = 'batch'
c.BatchSpawnerBase.req_host = 'lk-pma-cluster-head'
c.BatchSpawnerBase.req_runtime = '12:00:00'
c.BatchSpawnerBase.req_memory = '6gb'

c.SlurmSpawner.batch_script = '''#!/bin/bash
#SBATCH --output={homedir}/jupyterhub_slurmspawner_%j.log
#SBATCH --job-name=jupyter-spawner
#SBATCH --chdir={homedir}
#SBATCH --export={keepvars}
#SBATCH --get-user-env=L
#SBATCH --partition={partition}
#SBATCH --time={runtime}
#SBATCH --mem={memory}
#SBATCH --cpus-per-task={nprocs}
#SBATCH --gres={gres}
#SBATCH {options}
module load openmpi/4.0.5
module load python/3.7.9
module load libs/python-3.7.9
module load cuda/10.1
module load gmsh/4.6.0
module load octopus/10.1

set -euo pipefail
trap 'echo SIGTERM received' TERM
which jupyterhub-singleuser
/opt/slurm/slurm-20.02.5/bin/srun {cmd}
echo "jupyterhub-singleuser ended gracefully"
'''

c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'
#------------------------------------------------------------------------------
# ProfilesSpawner configuration
#------------------------------------------------------------------------------
# List of profiles to offer for selection. Signature is:
#   List(Tuple( Unicode, Unicode, Type(Spawner), Dict ))
# corresponding to profile display name, unique key, Spawner class,
# dictionary of spawner config options.
#
# The first three values will be exposed in the input_template as {display},
# {key}, and {type}
c.ProfilesSpawner.ip = '0.0.0.0'
c.ProfilesSpawner.profiles = [
   ( "Head Node", 'local', 'jupyterhub.spawner.LocalProcessSpawner', {'ip':'0.0.0.0'}),
   ( "Compute Node, 1core, 6GB, 12 hours", 'compute-c1_r6_t0.5', 'batchspawner.SlurmSpawner', dict(req_partition='batch', req_gres='')),
   ( "Compute Node, 1cores, 15GB, 7 days, V100 GPU", 'compute-c1_r15_g-v100_t7', 'batchspawner.SlurmSpawner', dict(req_partition='batch', req_nprocs='1', req_memory='15gb', req_runtime='168:00:00', req_gres='gpu:v100:1')),
]

c.JupyterHub.ssl_cert = '/etc/ssl/certs/jupyterhub-cert.pem'

c.JupyterHub.ssl_key = '/etc/ssl/keys/jupyterhub-key.pem'

c.Spawner.env_keep = ['PATH', 'LD_LIBRARY_PATH', 'PYTHONPATH', 'VIRTUAL_ENV', 'LANG', 'LC_ALL', 'MKL_NUM_THREADS', 'OMP_NUM_THREADS', 'OPENBLAS_NUM_THREADS', 'BASH_FUNC__moduleraw', 'BASH_FUNC_switchml', 'BASH_FUNC_module', 'MODULESHOME', 'MODULEPATH', 'MODULES_CMD']
c.Spawner.environment = {}

c.SingleUserNotebookApp.shutdown_no_activity_timeout = 7*24*60*60
c.NotebookApp.shutdown_no_activity_timeout = 7*24*60*60
c.MappingKernelManager.cull_idle_timeout = 7*24*60*60 # 1 week
c.MappingKernelManager.cull_interval = 24*60*60

welcome[bot] commented 3 years ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

jbaksta commented 3 years ago

You may have figured this out at this point, but even though you seem to be running the service as root, the calls to slurm are probably being done through sudo which may have a secure path and that path may not include your slurm installation. So I'd look at your /etc/sudoers file and likely set the secure_path.

hoba87 commented 1 year ago

That was indeed the case. Adjusting the secure_path solves this for me.