Closed t20100 closed 6 months ago
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
Logs are redirected by systemd with (and rotated with logrotate):
If you're redirecting stdout (or stderr) that's outside the control of the JupyterHub. As far as the JupyterHub is concerned it's writing to stdout, so you're effectively saying JupyterHub and all it's components should ignore a failure to write to stdout, which I don't think we want.
Since the redirect and log file creation/management are handled by a separate application (in this case systemd) that system should be responsible for deciding what to do when errors such as a full log partition occur, for instance by discarding the logs instead of blocking.
In practice I can't think of any situation where a full logs partition isn't an error- logs are critical for maintaining the security of a production system.
OK, fair enough. We've fixed it anyway.
Thanks for the answer.
Bug description
When redirecting jupyterhub's stdout/stderr to a file (e.g., on
/var/log
), new spawn fails when the partition is full. From what I can see, it looks to come from the configurable-http-proxy logging system that fails (see logs below).Setting
c.ConfigurableHTTPProxy.log_level = "error"
in jupyterhub config (to silentinfo
logs:Adding route /user/...
,Route added /user/...
and201 POST /api/routes/user/...
) makes it work under the same condition.How to reproduce
Expected behaviour
Though it's definitely best to avoid filling the log partition and this issue can be mitigated, it would be best if logging failures do not prevent the service from working.
Actual behaviour
When the logs are redirected to a full partition, newly spawned sessions do not succeed.
Your personal set up
Full environment
``` alembic==1.13.1 annotated-types==0.6.0 async-generator==1.10 attrs==23.2.0 batchspawner==1.3.0 certifi==2024.2.2 certipy==0.1.3 cffi==1.16.0 charset-normalizer==3.3.2 cryptography==42.0.5 greenlet==3.0.3 idna==3.6 importlib_metadata==7.0.2 importlib_resources==6.3.1 Jinja2==3.1.3 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter-telemetry==0.1.0 jupyterhub==4.1.5 jupyterhub-idle-culler==1.3.1 jupyterhub-moss==7.0.1 Mako==1.3.2 MarkupSafe==2.1.5 oauthenticator==16.2.1 oauthlib==3.2.2 packaging==24.0 pamela==1.1.0 pkg_resources==0.0.0 pkgutil_resolve_name==1.3.10 prometheus_client==0.20.0 pycparser==2.21 pydantic==2.6.4 pydantic_core==2.16.3 pyOpenSSL==24.1.0 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 referencing==0.34.0 requests==2.31.0 rpds-py==0.18.0 ruamel.yaml==0.18.6 ruamel.yaml.clib==0.2.8 six==1.16.0 SQLAlchemy==2.0.28 tornado==6.4 traitlets==5.14.2 typing_extensions==4.10.0 urllib3==2.2.1 zipp==3.18.1 ```Configuration
The jupyterhub server uses jupyterhub_moss's [MOSlurmSpawner](https://github.com/silx-kit/jupyterhub_moss/blob/d7fac917401b75fdaf7cd0cd67ede85299effc35/jupyterhub_moss/spawner.py#L44) which is based on batchspawner's `SlurmSpawner`. The authenticator is oauthenticator's `GenericOAuthenticator` ```python # jupyterhub_conf.py snippet import logging import batchspawner from oauthenticator.generic import GenericOAuthenticator import jupyterhub_moss logging.raiseExceptions = False # Silent exception occuring during log handling c = get_config() jupyterhub_moss.set_config(c) ... c.JupyterHub.authenticator_class = GenericOAuthenticator ... c.JupyterHub.cleanup_servers = False c.JupyterHub.cleanup_proxy = False ``` Logs are redirected by systemd with (and rotated with logrotate): ```systemd [Service] ... StandardOutput=append:/var/log/jupyterhub/jupyterhub.log ```Logs
``` 16:01:51.189 [ConfigProxy] error: Uncaught Exception: ENOSPC: no space left on device, write 16:01:51.193 [ConfigProxy] error: Error: ENOSPC: no space left on device, write at writeSync (node:fs:933:3) at SyncWriteStream._write (node:internal/fs/sync_write_stream:27:5) at writeOrBuffer (node:internal/streams/writable:564:12) at _write (node:internal/streams/writable:493:10) at Writable.write (node:internal/streams/writable:502:10) at Console.log (/usr/lib/node_modules/configurable-http-proxy/node_modules/winston/lib/winston/transports/console.js:79:23) at Console._write (/usr/lib/node_modules/configurable-http-proxy/node_modules/winston-transport/modern.js:82:19) at doWrite (/usr/lib/node_modules/configurable-http-proxy/node_modules/readable-stream/lib/_stream_writable.js:390:139) at writeOrBuffer (/usr/lib/node_modules/configurable-http-proxy/node_modules/readable-stream/lib/_stream_writable.js:381:5) at Writable.write (/usr/lib/node_modules/configurable-http-proxy/node_modules/readable-stream/lib/_stream_writable.js:302:11) ``` Note: This is what is logged when freeing some space on the partition.