jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
190 stars 134 forks source link

Uncaught exception when JupyterHub tries to communicate with SingleUserLabApp on compute node #242

Open dithwick opened 2 years ago

dithwick commented 2 years ago

Bug description

batchspawner-singleuser starts correctly on the compute node and communicates back to the hub to indicate this. JupyterHub shows:

Server ready at [/user/username/]

and redirects to /user/username/lab?

but this then shows a

503 : Service Unavailable
Your server appears to be down. Try restarting it [from the hub]

to the user.

There are some complexities with the networking (described below) that are likely related to the problem I'm seeing and it could also be a configuration issue. However given I'm seeing an exception I thought it best to report it as a bug.

Expected behaviour

The user is presented the JupyterHub session from the compute node.

Actual behaviour

The user is presented with a 503 error. The singleuser process on the compute node generates an uncaught exception traceback (see logs) but continues to run until the job is cancelled.

How to reproduce

  1. Login to JupyterHub.
  2. Select a job profile and click start.
  3. Job waits in queue, JupyterHub shows job status to user.
  4. Job starts, JupyterHub shows "waiting to connect"
  5. JupyterHub shows "Server ready...."
  6. Error shown

Your personal set up

Full environment ``` # packages in environment at /opt/jupyterhub/miniconda/2022-04-10: # # Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 4.5 1_gnu alembic 1.7.7 pyhd8ed1ab_0 conda-forge async_generator 1.10 py_0 conda-forge attrs 21.4.0 pyhd8ed1ab_0 conda-forge batchspawner 1.1.0 pypi_0 pypi blinker 1.4 py_1 conda-forge brotlipy 0.7.0 py39h27cfd23_1003 c-ares 1.18.1 h7f8727e_0 ca-certificates 2021.10.8 ha878542_0 conda-forge certifi 2021.10.8 py39hf3d152e_2 conda-forge certipy 0.1.3 py_0 conda-forge cffi 1.15.0 py39hd667e15_1 charset-normalizer 2.0.4 pyhd3eb1b0_0 conda 4.12.0 py39hf3d152e_0 conda-forge conda-content-trust 0.1.1 pyhd3eb1b0_0 conda-package-handling 1.7.3 py39h27cfd23_1 configurable-http-proxy 4.0.1 node14_0 cryptography 36.0.0 py39h9ce1e76_0 defusedxml 0.7.1 pypi_0 pypi elementpath 2.5.0 pypi_0 pypi entrypoints 0.4 pyhd8ed1ab_0 conda-forge icu 68.1 h2531618_0 idna 3.3 pyhd3eb1b0_0 importlib-metadata 4.11.3 py39hf3d152e_1 conda-forge importlib_resources 5.6.0 pyhd8ed1ab_0 conda-forge jinja2 3.1.1 pyhd8ed1ab_0 conda-forge jsonschema 4.4.0 pyhd8ed1ab_0 conda-forge jupyter-saml2authenticator 0.1.0.dev0 pypi_0 pypi jupyter_telemetry 0.1.0 pyhd8ed1ab_1 conda-forge jupyterhub 2.2.2 pyhd8ed1ab_1 conda-forge jupyterhub-base 2.2.2 pyhd8ed1ab_1 conda-forge krb5 1.19.2 hac12032_0 ld_impl_linux-64 2.35.1 h7274673_9 libcurl 7.80.0 h0b77cf5_0 libedit 3.1.20210910 h7f8727e_0 libev 4.33 h516909a_1 conda-forge libffi 3.3 he6710b0_2 libgcc-ng 9.3.0 h5101ec6_17 libgomp 9.3.0 h5101ec6_17 libnghttp2 1.46.0 hce63b2e_0 libssh2 1.9.0 h1ba5d50_1 libstdcxx-ng 9.3.0 hd4cf53a_17 libuv 1.40.0 h7b6447c_0 mako 1.2.0 pyhd8ed1ab_1 conda-forge markupsafe 2.0.1 py39h3811e60_0 conda-forge ncurses 6.3 h7f8727e_2 nodejs 14.8.0 hda19d22_0 oauthlib 3.2.0 pyhd8ed1ab_0 conda-forge openssl 1.1.1n h7f8727e_0 packaging 21.3 pyhd8ed1ab_0 conda-forge pamela 1.0.0 py_0 conda-forge pip 21.2.4 py39h06a4308_0 prometheus_client 0.14.1 pyhd8ed1ab_0 conda-forge psutil 5.8.0 py39h27cfd23_1 pycosat 0.6.3 py39h27cfd23_0 pycparser 2.21 pyhd3eb1b0_0 pycurl 7.44.1 py39h8f2d780_1 pyjwt 2.3.0 pyhd8ed1ab_1 conda-forge pyopenssl 21.0.0 pyhd3eb1b0_1 pyparsing 3.0.7 pyhd8ed1ab_0 conda-forge pyrsistent 0.18.0 py39heee7806_0 pysaml2 7.1.2 pypi_0 pypi pysocks 1.7.1 py39h06a4308_0 python 3.9.7 h12debd9_1 python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-json-logger 2.0.1 pyh9f0ad1d_0 conda-forge python_abi 3.9 2_cp39 conda-forge pytz 2022.1 pypi_0 pypi readline 8.1.2 h7f8727e_1 requests 2.27.1 pyhd3eb1b0_0 ruamel.yaml 0.16.12 py39h27cfd23_1 ruamel.yaml.clib 0.2.6 py39h7f8727e_0 ruamel_yaml 0.15.100 py39h27cfd23_0 setuptools 58.0.4 py39h06a4308_0 six 1.16.0 pyhd3eb1b0_0 sqlalchemy 1.3.23 py39h3811e60_0 conda-forge sqlite 3.37.0 hc218d9a_0 tk 8.6.11 h1ccaba5_0 tornado 6.1 py39h3811e60_1 conda-forge tqdm 4.62.3 pyhd3eb1b0_1 traitlets 5.1.1 pyhd8ed1ab_0 conda-forge tzdata 2021e hda174b7_0 urllib3 1.26.7 pyhd3eb1b0_0 wheel 0.37.1 pyhd3eb1b0_0 wrapspawner 1.0.1 pypi_0 pypi xmlschema 1.10.0 pypi_0 pypi xz 5.2.5 h7b6447c_0 yaml 0.2.5 h7b6447c_0 zipp 3.8.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h7f8727e_4 ```
Configuration ```python c.Application.log_level = 'DEBUG' c.JupyterHub.cookie_secret_file = '/var/jupyterhub/jupyterhub_cookie_secret' c.JupyterHub.data_files_path = '/opt/jupyterhub/miniconda/2022-04-10/share/jupyterhub' c.JupyterHub.db_url = 'sqlite:////var/jupyterhub/jupyterhub.sqlite' c.ConfigurableHTTPProxy.debug = True c.JupyterHub.hub_connect_ip = 'hub:ipv6:address' c.JupyterHub.hub_ip = 'hub:ipv6:address' c.JupyterHub.ip = '' c.JupyterHub.log_level = 'DEBUG' c.ConfigurableHTTPProxy.pid_file = '/var/jupyterhub/jupyterhub-proxy.pid' c.Spawner.debug = True c = get_config() c.JupyterHub.spawner_class = "wrapspawner.ProfilesSpawner" c.Spawner.http_timeout = 6000 c.Spawner.start_timeout = 6000 import batchspawner import batchspawner.api c.PBSSpawner.batch_script = ''' #!/bin/sh #PBS -l walltime={runtime} #PBS -l select=1:ncpus={nprocs}:mem={memory}:ngpus={ngpus}:mpiprocs=1:ompthreads={nprocs} #PBS -N jupyterhub #PBS -v {keepvars} #PBS -q {queue} export RELEASE=2022-04-10 export PATH=/apps/jupyterhub/$RELEASE/bin:/bin:/usr/bin:/sbin:/usr/sbin:/opt/pbs/default/bin:/usr/local/bin printenv set -x echo "{cmd}" > $HOME/.jupyterhub-lab.stdout 2>&1 cd $HOME {cmd} --ip="::" echo $? ''' c.BatchSpawnerBase.req_nprocs = '4' c.BatchSpawnerBase.req_runtime = '25:0:0' c.BatchSpawnerBase.req_memory = '4gb' c.BatchSpawnerBase.req_ngpus = '0' c.BatchSpawnerBase.req_queue = 'interactive' c.BatchSpawnerBase.batch_submit_cmd = '/opt/jupyterhub/pbs/bin/__qsub' c.BatchSpawnerBase.batch_cancel_cmd = '/opt/jupyterhub/pbs/bin/__qdel {job_id}' c.BatchSpawnerBase.batch_query_cmd = '/opt/jupyterhub/pbs/bin/__qstat {job_id}' c.BatchSpawnerBase.state_exechost_re = 'exec_host = (.+)' c.BatchSpawnerBase.state_pending_re = 'job_state = [QH]' c.BatchSpawnerBase.state_running_re = 'job_state = R' c.ProfilesSpawner.profiles = [ ( "1 cores 8GB 8 hours", 'c1-1c-8g-8h', 'batchspawner.PBSSpawner', { "req_nprocs" : "1", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"8gb", "req_ngpus":"0" } ), ( "4 cores, 32GB, 8 hours", "c1-4c-16g-8h", "batchspawner.PBSSpawner", { "req_nprocs" : "4", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"32gb", "req_ngpus":"0" } ), ( "8 cores, 64GB, 8 hours", "c1-8c-64g-8h", "batchspawner.PBSSpawner", { "req_nprocs" : "8", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"64gb", "req_ngpus":"0" } ), ( "2 cores, 16GB, 8 hours, 1 GPU", "c1-2c-16g-8h-1gpu", "batchspawner.PBSSpawner", { "req_nprocs" : "2", "req_queue":"gpu", "req_runtime":"08:00:00", "req_memory":"16gb", "req_ngpus":"1" } ), ] from jupyter_saml2authenticator import Saml2Authenticator from traitlets import Bool class CheckingSaml2Authenticator(Saml2Authenticator): def validate_username(self, username): if not super().validate_username(username): return False """Check if the user exists on the system""" import pwd try: pwd.getpwnam(username) except KeyError: return False else: return True # For some reason this is not configurable in the parent class delete_invalid_users = Bool( default_value = True, config = True, help="Whether to delete users (from JupyterHub DB) who no longer validate", ) c.JupyterHub.authenticator_class = CheckingSaml2Authenticator c.CheckingSaml2Authenticator.saml2_metadata_url = 'REDACTED' c.CheckingSaml2Authenticator.saml2_entity_id = 'REDACTED' c.CheckingSaml2Authenticator.saml2_attribute_username = 'name' c.CheckingSaml2Authenticator.delete_invalid_users = True c.CheckingSaml2Authenticator.login_service = 'ID' ```
Logs From the hub logs (I've redacted some items - hopefully not so much it's impossible to work out what is going on) ``` [I 2022-04-10 14:01:45.365 JupyterHub log:189] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-username&redirect_uri=%2Fuser%2Fusername%2Foauth_callback&response_type=code&state=[secret] -> /user/username/oauth_callback?code=[secret]&state=[secret] (username@::ffff:user.ipv4.address) 54.29ms 14:01:45.938 [ConfigProxy] debug: PROXY WEB /user/username/oauth_callback?code=redacted to http://[compute:ipv6:address]:40713 [D 2022-04-10 14:01:45.961 JupyterHub scopes:301] Authenticated with token [D 2022-04-10 14:01:45.962 oauthlib.oauth2.rfc6749.endpoints.token token:112] Dispatching grant_type authorization_code request to . [D 2022-04-10 14:01:45.962 JupyterHub provider:53] authenticate_client [D 2022-04-10 14:01:45.974 oauthlib.oauth2.rfc6749.grant_types.authorization_code authorization_code:534] Using provided redirect_uri /user/username/oauth_callback [D 2022-04-10 14:01:45.975 JupyterHub provider:112] confirm_redirect_uri: client_id=jupyterhub-user-username, redirect_uri=/user/username/oauth_callback [D 2022-04-10 14:01:45.975 oauthlib.oauth2.rfc6749.grant_types.authorization_code authorization_code:302] Token request validation ok for . [D 2022-04-10 14:01:45.975 JupyterHub provider:335] Saving bearer token {'access_token': 'REDACTED', 'expires_in': 1209600, 'token_type': 'Bearer', 'scope': '', 'refresh_token': 'REDACTED'} [D 2022-04-10 14:01:46.000 JupyterHub provider:195] Deleting oauth code 4Ef... for jupyterhub-user-username [I 2022-04-10 14:01:46.020 JupyterHub log:189] 200 POST /hub/api/oauth2/token (username@compute:ipv6:address) 64.17ms [D 2022-04-10 14:01:46.036 JupyterHub base:281] Recording first activity for [D 2022-04-10 14:01:46.055 JupyterHub scopes:301] Authenticated with token [I 2022-04-10 14:01:46.066 JupyterHub log:189] 200 GET /hub/api/user (username@compute:ipv6:address) 32.94ms 14:01:46.637 [ConfigProxy] debug: PROXY WEB /user/username/lab? to http://[compute:ipv6:address]:40713 14:01:46.850 [ConfigProxy] error: 503 GET /user/username/lab? socket hang up [D 2022-04-10 14:01:46.857 JupyterHub pages:586] No template for 503 [I 2022-04-10 14:01:46.872 JupyterHub log:189] 200 GET /hub/error/503?url=%2Fuser%2Fusername%2Flab%3F (@hub:ipv6:address) 16.33ms ``` and from the compute node: ``` [D 2022-04-10 14:01:46.062 SingleUserLabApp auth:395] Received request from Hub user {'admin': True, 'groups': [], 'name': 'username', 'kind': 'user', 'session_id': 'redacted', 'scopes': ['access:servers!server=username/']} [I 2022-04-10 14:01:46.062 SingleUserLabApp auth:1132] Logged-in user {'admin': True, 'groups': [], 'name': 'username', 'kind': 'user', 'session_id': 'redacted', 'scopes': ['access:servers!server=username/']} [D 2022-04-10 14:01:46.062 SingleUserLabApp auth:841] Setting oauth cookie for ::ffff:user.ipv4.address: jupyterhub-user-username, {'path': '/user/username/', 'httponly': True, 'secure': True} [I 2022-04-10 14:01:46.063 SingleUserLabApp log:189] 302 GET /user/username/oauth_callback?code=[secret]&state=[secret] -> /user/username/lab? (@::ffff:user.ipv4.address) 119.27ms [E 2022-04-10 14:01:46.643 SingleUserLabApp http1connection:67] Uncaught exception Traceback (most recent call last): File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/http1connection.py", line 273, in _read_message delegate.finish() File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/httpserver.py", line 387, in finish self.delegate.finish() File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/routing.py", line 268, in finish self.delegate.finish() File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 2290, in finish self.execute() File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 2309, in execute self.handler = self.handler_class( File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 227, in __init__ self.clear() File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 328, in clear self.set_default_headers() File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyter_server/base/handlers.py", line 314, in set_default_headers elif self.token_authenticated and "Access-Control-Allow-Origin" not in self.settings.get( File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyter_server/base/handlers.py", line 159, in token_authenticated return self.login_handler.is_token_authenticated(self) File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/singleuser/mixins.py", line 105, in is_token_authenticated handler.get_current_user() File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 1051, in get_current_user self._hub_auth_user_cache = self.check_hub_user(user_model) File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 978, in check_hub_user if self.allow_all: File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 924, in allow_all self.hub_scopes is None File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 916, in hub_scopes return self.hub_auth.oauth_scopes or None File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 577, in __get__ return self.get(obj, cls) File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 540, in get default = obj.trait_defaults(self.name) File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 1580, in trait_defaults return self._get_trait_default_generator(names[0])(self) File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 977, in __call__ return self.func(*args, **kwargs) File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 357, in _default_scopes return set(json.loads(env_scopes)) File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/__init__.py", line 346, in loads return _default_decoder.decode(s) File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1) ```
welcome[bot] commented 2 years ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada: