jupyterhub / configurable-http-proxy

node-http-proxy plus a REST API
BSD 3-Clause "New" or "Revised" License
239 stars 127 forks source link

Error when redirecting jupyterhub's stdout to a file on a full partition #537

Closed t20100 closed 4 months ago

t20100 commented 4 months ago

Bug description

When redirecting jupyterhub's stdout/stderr to a file (e.g., on /var/log), new spawn fails when the partition is full. From what I can see, it looks to come from the configurable-http-proxy logging system that fails (see logs below).

Setting c.ConfigurableHTTPProxy.log_level = "error" in jupyterhub config (to silent info logs: Adding route /user/..., Route added /user/... and 201 POST /api/routes/user/...) makes it work under the same condition.

How to reproduce

  1. Redirect jupyterhub stdout to a file on a (small) partition
  2. Fill this partition with random files
  3. Spawn a session from jupyterhub's spawning page
  4. Error: Spawn fails

Expected behaviour

Though it's definitely best to avoid filling the log partition and this issue can be mitigated, it would be best if logging failures do not prevent the service from working.

Actual behaviour

When the logs are redirected to a full partition, newly spawned sessions do not succeed.

Your personal set up

Full environment ``` alembic==1.13.1 annotated-types==0.6.0 async-generator==1.10 attrs==23.2.0 batchspawner==1.3.0 certifi==2024.2.2 certipy==0.1.3 cffi==1.16.0 charset-normalizer==3.3.2 cryptography==42.0.5 greenlet==3.0.3 idna==3.6 importlib_metadata==7.0.2 importlib_resources==6.3.1 Jinja2==3.1.3 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter-telemetry==0.1.0 jupyterhub==4.1.5 jupyterhub-idle-culler==1.3.1 jupyterhub-moss==7.0.1 Mako==1.3.2 MarkupSafe==2.1.5 oauthenticator==16.2.1 oauthlib==3.2.2 packaging==24.0 pamela==1.1.0 pkg_resources==0.0.0 pkgutil_resolve_name==1.3.10 prometheus_client==0.20.0 pycparser==2.21 pydantic==2.6.4 pydantic_core==2.16.3 pyOpenSSL==24.1.0 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 referencing==0.34.0 requests==2.31.0 rpds-py==0.18.0 ruamel.yaml==0.18.6 ruamel.yaml.clib==0.2.8 six==1.16.0 SQLAlchemy==2.0.28 tornado==6.4 traitlets==5.14.2 typing_extensions==4.10.0 urllib3==2.2.1 zipp==3.18.1 ```
Configuration The jupyterhub server uses jupyterhub_moss's [MOSlurmSpawner](https://github.com/silx-kit/jupyterhub_moss/blob/d7fac917401b75fdaf7cd0cd67ede85299effc35/jupyterhub_moss/spawner.py#L44) which is based on batchspawner's `SlurmSpawner`. The authenticator is oauthenticator's `GenericOAuthenticator` ```python # jupyterhub_conf.py snippet import logging import batchspawner from oauthenticator.generic import GenericOAuthenticator import jupyterhub_moss logging.raiseExceptions = False # Silent exception occuring during log handling c = get_config() jupyterhub_moss.set_config(c) ... c.JupyterHub.authenticator_class = GenericOAuthenticator ... c.JupyterHub.cleanup_servers = False c.JupyterHub.cleanup_proxy = False ``` Logs are redirected by systemd with (and rotated with logrotate): ```systemd [Service] ... StandardOutput=append:/var/log/jupyterhub/jupyterhub.log ```
Logs ``` 16:01:51.189 [ConfigProxy] error: Uncaught Exception: ENOSPC: no space left on device, write 16:01:51.193 [ConfigProxy] error: Error: ENOSPC: no space left on device, write at writeSync (node:fs:933:3) at SyncWriteStream._write (node:internal/fs/sync_write_stream:27:5) at writeOrBuffer (node:internal/streams/writable:564:12) at _write (node:internal/streams/writable:493:10) at Writable.write (node:internal/streams/writable:502:10) at Console.log (/usr/lib/node_modules/configurable-http-proxy/node_modules/winston/lib/winston/transports/console.js:79:23) at Console._write (/usr/lib/node_modules/configurable-http-proxy/node_modules/winston-transport/modern.js:82:19) at doWrite (/usr/lib/node_modules/configurable-http-proxy/node_modules/readable-stream/lib/_stream_writable.js:390:139) at writeOrBuffer (/usr/lib/node_modules/configurable-http-proxy/node_modules/readable-stream/lib/_stream_writable.js:381:5) at Writable.write (/usr/lib/node_modules/configurable-http-proxy/node_modules/readable-stream/lib/_stream_writable.js:302:11) ``` Note: This is what is logged when freeing some space on the partition.
welcome[bot] commented 4 months ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

manics commented 4 months ago

Logs are redirected by systemd with (and rotated with logrotate):

If you're redirecting stdout (or stderr) that's outside the control of the JupyterHub. As far as the JupyterHub is concerned it's writing to stdout, so you're effectively saying JupyterHub and all it's components should ignore a failure to write to stdout, which I don't think we want.

Since the redirect and log file creation/management are handled by a separate application (in this case systemd) that system should be responsible for deciding what to do when errors such as a full log partition occur, for instance by discarding the logs instead of blocking.

In practice I can't think of any situation where a full logs partition isn't an error- logs are critical for maintaining the security of a production system.

t20100 commented 4 months ago

OK, fair enough. We've fixed it anyway.

Thanks for the answer.