Open dithwick opened 2 years ago
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
Bug description
batchspawner-singleuser starts correctly on the compute node and communicates back to the hub to indicate this. JupyterHub shows:
Server ready at [/user/username/]
and redirects to /user/username/lab?
but this then shows a
to the user.
There are some complexities with the networking (described below) that are likely related to the problem I'm seeing and it could also be a configuration issue. However given I'm seeing an exception I thought it best to report it as a bug.
Expected behaviour
The user is presented the JupyterHub session from the compute node.
Actual behaviour
The user is presented with a 503 error. The singleuser process on the compute node generates an uncaught exception traceback (see logs) but continues to run until the job is cancelled.
How to reproduce
Your personal set up
Networking is dual stack (ipv4/ipv6) with some complexities:
User access to the Hub is over ipv4 and ipv6.
Hub connectivity to PBSPro is over ipv4.
Communication between the hub and the compute nodes needs to be over ipv6 as only this is routable between the JupyterHub server and compute nodes. However DNS has both ipv4 and ipv6 entries for the compute nodes and when JupyterHub was resolving the IP address of the compute nodes from exechosts regular expression match, it was resolving the ipv4 address. To work around this I wrote a wrapper script around
qstat
to replace the hostname (in the output from qstat) with the ipv6 address; there is probably a better way of doing this but it seems to work up until I reach the problem being reported.OS: JupyterHub is running on CentOS 7.9, the compute nodes are running RHEL 8.5
Versions:
Python: python 3.9.7
JupyterHub: jupyterhub 2.2.2
BatchSpawner: batchspawner 1.1.0
wrapspawner 1.0.1
jupyter-saml2authenticator 0.1.0.dev0
Full environment
``` # packages in environment at /opt/jupyterhub/miniconda/2022-04-10: # # Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 4.5 1_gnu alembic 1.7.7 pyhd8ed1ab_0 conda-forge async_generator 1.10 py_0 conda-forge attrs 21.4.0 pyhd8ed1ab_0 conda-forge batchspawner 1.1.0 pypi_0 pypi blinker 1.4 py_1 conda-forge brotlipy 0.7.0 py39h27cfd23_1003 c-ares 1.18.1 h7f8727e_0 ca-certificates 2021.10.8 ha878542_0 conda-forge certifi 2021.10.8 py39hf3d152e_2 conda-forge certipy 0.1.3 py_0 conda-forge cffi 1.15.0 py39hd667e15_1 charset-normalizer 2.0.4 pyhd3eb1b0_0 conda 4.12.0 py39hf3d152e_0 conda-forge conda-content-trust 0.1.1 pyhd3eb1b0_0 conda-package-handling 1.7.3 py39h27cfd23_1 configurable-http-proxy 4.0.1 node14_0 cryptography 36.0.0 py39h9ce1e76_0 defusedxml 0.7.1 pypi_0 pypi elementpath 2.5.0 pypi_0 pypi entrypoints 0.4 pyhd8ed1ab_0 conda-forge icu 68.1 h2531618_0 idna 3.3 pyhd3eb1b0_0 importlib-metadata 4.11.3 py39hf3d152e_1 conda-forge importlib_resources 5.6.0 pyhd8ed1ab_0 conda-forge jinja2 3.1.1 pyhd8ed1ab_0 conda-forge jsonschema 4.4.0 pyhd8ed1ab_0 conda-forge jupyter-saml2authenticator 0.1.0.dev0 pypi_0 pypi jupyter_telemetry 0.1.0 pyhd8ed1ab_1 conda-forge jupyterhub 2.2.2 pyhd8ed1ab_1 conda-forge jupyterhub-base 2.2.2 pyhd8ed1ab_1 conda-forge krb5 1.19.2 hac12032_0 ld_impl_linux-64 2.35.1 h7274673_9 libcurl 7.80.0 h0b77cf5_0 libedit 3.1.20210910 h7f8727e_0 libev 4.33 h516909a_1 conda-forge libffi 3.3 he6710b0_2 libgcc-ng 9.3.0 h5101ec6_17 libgomp 9.3.0 h5101ec6_17 libnghttp2 1.46.0 hce63b2e_0 libssh2 1.9.0 h1ba5d50_1 libstdcxx-ng 9.3.0 hd4cf53a_17 libuv 1.40.0 h7b6447c_0 mako 1.2.0 pyhd8ed1ab_1 conda-forge markupsafe 2.0.1 py39h3811e60_0 conda-forge ncurses 6.3 h7f8727e_2 nodejs 14.8.0 hda19d22_0 oauthlib 3.2.0 pyhd8ed1ab_0 conda-forge openssl 1.1.1n h7f8727e_0 packaging 21.3 pyhd8ed1ab_0 conda-forge pamela 1.0.0 py_0 conda-forge pip 21.2.4 py39h06a4308_0 prometheus_client 0.14.1 pyhd8ed1ab_0 conda-forge psutil 5.8.0 py39h27cfd23_1 pycosat 0.6.3 py39h27cfd23_0 pycparser 2.21 pyhd3eb1b0_0 pycurl 7.44.1 py39h8f2d780_1 pyjwt 2.3.0 pyhd8ed1ab_1 conda-forge pyopenssl 21.0.0 pyhd3eb1b0_1 pyparsing 3.0.7 pyhd8ed1ab_0 conda-forge pyrsistent 0.18.0 py39heee7806_0 pysaml2 7.1.2 pypi_0 pypi pysocks 1.7.1 py39h06a4308_0 python 3.9.7 h12debd9_1 python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-json-logger 2.0.1 pyh9f0ad1d_0 conda-forge python_abi 3.9 2_cp39 conda-forge pytz 2022.1 pypi_0 pypi readline 8.1.2 h7f8727e_1 requests 2.27.1 pyhd3eb1b0_0 ruamel.yaml 0.16.12 py39h27cfd23_1 ruamel.yaml.clib 0.2.6 py39h7f8727e_0 ruamel_yaml 0.15.100 py39h27cfd23_0 setuptools 58.0.4 py39h06a4308_0 six 1.16.0 pyhd3eb1b0_0 sqlalchemy 1.3.23 py39h3811e60_0 conda-forge sqlite 3.37.0 hc218d9a_0 tk 8.6.11 h1ccaba5_0 tornado 6.1 py39h3811e60_1 conda-forge tqdm 4.62.3 pyhd3eb1b0_1 traitlets 5.1.1 pyhd8ed1ab_0 conda-forge tzdata 2021e hda174b7_0 urllib3 1.26.7 pyhd3eb1b0_0 wheel 0.37.1 pyhd3eb1b0_0 wrapspawner 1.0.1 pypi_0 pypi xmlschema 1.10.0 pypi_0 pypi xz 5.2.5 h7b6447c_0 yaml 0.2.5 h7b6447c_0 zipp 3.8.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h7f8727e_4 ```Configuration
```python c.Application.log_level = 'DEBUG' c.JupyterHub.cookie_secret_file = '/var/jupyterhub/jupyterhub_cookie_secret' c.JupyterHub.data_files_path = '/opt/jupyterhub/miniconda/2022-04-10/share/jupyterhub' c.JupyterHub.db_url = 'sqlite:////var/jupyterhub/jupyterhub.sqlite' c.ConfigurableHTTPProxy.debug = True c.JupyterHub.hub_connect_ip = 'hub:ipv6:address' c.JupyterHub.hub_ip = 'hub:ipv6:address' c.JupyterHub.ip = '' c.JupyterHub.log_level = 'DEBUG' c.ConfigurableHTTPProxy.pid_file = '/var/jupyterhub/jupyterhub-proxy.pid' c.Spawner.debug = True c = get_config() c.JupyterHub.spawner_class = "wrapspawner.ProfilesSpawner" c.Spawner.http_timeout = 6000 c.Spawner.start_timeout = 6000 import batchspawner import batchspawner.api c.PBSSpawner.batch_script = ''' #!/bin/sh #PBS -l walltime={runtime} #PBS -l select=1:ncpus={nprocs}:mem={memory}:ngpus={ngpus}:mpiprocs=1:ompthreads={nprocs} #PBS -N jupyterhub #PBS -v {keepvars} #PBS -q {queue} export RELEASE=2022-04-10 export PATH=/apps/jupyterhub/$RELEASE/bin:/bin:/usr/bin:/sbin:/usr/sbin:/opt/pbs/default/bin:/usr/local/bin printenv set -x echo "{cmd}" > $HOME/.jupyterhub-lab.stdout 2>&1 cd $HOME {cmd} --ip="::" echo $? ''' c.BatchSpawnerBase.req_nprocs = '4' c.BatchSpawnerBase.req_runtime = '25:0:0' c.BatchSpawnerBase.req_memory = '4gb' c.BatchSpawnerBase.req_ngpus = '0' c.BatchSpawnerBase.req_queue = 'interactive' c.BatchSpawnerBase.batch_submit_cmd = '/opt/jupyterhub/pbs/bin/__qsub' c.BatchSpawnerBase.batch_cancel_cmd = '/opt/jupyterhub/pbs/bin/__qdel {job_id}' c.BatchSpawnerBase.batch_query_cmd = '/opt/jupyterhub/pbs/bin/__qstat {job_id}' c.BatchSpawnerBase.state_exechost_re = 'exec_host = (.+)' c.BatchSpawnerBase.state_pending_re = 'job_state = [QH]' c.BatchSpawnerBase.state_running_re = 'job_state = R' c.ProfilesSpawner.profiles = [ ( "1 cores 8GB 8 hours", 'c1-1c-8g-8h', 'batchspawner.PBSSpawner', { "req_nprocs" : "1", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"8gb", "req_ngpus":"0" } ), ( "4 cores, 32GB, 8 hours", "c1-4c-16g-8h", "batchspawner.PBSSpawner", { "req_nprocs" : "4", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"32gb", "req_ngpus":"0" } ), ( "8 cores, 64GB, 8 hours", "c1-8c-64g-8h", "batchspawner.PBSSpawner", { "req_nprocs" : "8", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"64gb", "req_ngpus":"0" } ), ( "2 cores, 16GB, 8 hours, 1 GPU", "c1-2c-16g-8h-1gpu", "batchspawner.PBSSpawner", { "req_nprocs" : "2", "req_queue":"gpu", "req_runtime":"08:00:00", "req_memory":"16gb", "req_ngpus":"1" } ), ] from jupyter_saml2authenticator import Saml2Authenticator from traitlets import Bool class CheckingSaml2Authenticator(Saml2Authenticator): def validate_username(self, username): if not super().validate_username(username): return False """Check if the user exists on the system""" import pwd try: pwd.getpwnam(username) except KeyError: return False else: return True # For some reason this is not configurable in the parent class delete_invalid_users = Bool( default_value = True, config = True, help="Whether to delete users (from JupyterHub DB) who no longer validate", ) c.JupyterHub.authenticator_class = CheckingSaml2Authenticator c.CheckingSaml2Authenticator.saml2_metadata_url = 'REDACTED' c.CheckingSaml2Authenticator.saml2_entity_id = 'REDACTED' c.CheckingSaml2Authenticator.saml2_attribute_username = 'name' c.CheckingSaml2Authenticator.delete_invalid_users = True c.CheckingSaml2Authenticator.login_service = 'ID' ```Logs
From the hub logs (I've redacted some items - hopefully not so much it's impossible to work out what is going on) ``` [I 2022-04-10 14:01:45.365 JupyterHub log:189] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-username&redirect_uri=%2Fuser%2Fusername%2Foauth_callback&response_type=code&state=[secret] -> /user/username/oauth_callback?code=[secret]&state=[secret] (username@::ffff:user.ipv4.address) 54.29ms 14:01:45.938 [ConfigProxy] debug: PROXY WEB /user/username/oauth_callback?code=redacted to http://[compute:ipv6:address]:40713 [D 2022-04-10 14:01:45.961 JupyterHub scopes:301] Authenticated with token