jupyterhub / the-littlest-jupyterhub

Simple JupyterHub distribution for 1-100 users on a single server
https://tljh.jupyter.org
BSD 3-Clause "New" or "Revised" License
1.03k stars 339 forks source link

JupyterLab Server of user crashes #815

Open Agrigor opened 2 years ago

Agrigor commented 2 years ago

Bug description

At some point and several times per day the server of some users crashes, with the following error in journalctl of jupyter-username:

Mai 04 08:39:25 jupyterhubvm systemd[1]: Started /bin/bash -c cd /home/jupyter-cyril && exec jupyterhub-singleuser --port=38773 --SingleUserNotebookApp.default_url=/lab.
Mai 04 08:39:26 jupyterhubvm bash[14907]: [I 2022-05-04 08:39:26.512 SingleUserNotebookApp notebookapp:1593] Authentication of /metrics is OFF, since other authentication is disabled.
Mai 04 08:39:27 jupyterhubvm bash[14907]: [I 2022-05-04 08:39:27.378 LabApp] JupyterLab extension loaded from /opt/tljh/user/lib/python3.9/site-packages/jupyterlab
Mai 04 08:39:27 jupyterhubvm bash[14907]: [I 2022-05-04 08:39:27.378 LabApp] JupyterLab application directory is /opt/tljh/user/share/jupyter/lab
Mai 04 08:39:27 jupyterhubvm bash[14907]: /opt/tljh/user/lib/python3.9/site-packages/jupyter_server_mathjax/app.py:40: FutureWarning: The alias `_()` will be deprecated. Use `_i18n()` instead.
Mai 04 08:39:27 jupyterhubvm bash[14907]:   help=_("""The MathJax.js configuration file that is to be used."""),
Mai 04 08:39:27 jupyterhubvm bash[14907]: [W 2022-05-04 08:39:27.510 SingleUserNotebookApp notebookapp:2034] Error loading server extension nbresuse
Mai 04 08:39:27 jupyterhubvm bash[14907]:     Traceback (most recent call last):
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/notebook/notebookapp.py", line 2030, in init_server_extensions
Mai 04 08:39:27 jupyterhubvm bash[14907]:         func(self)
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/nbresuse/__init__.py", line 49, in load_jupyter_server_extension
Mai 04 08:39:27 jupyterhubvm bash[14907]:         PrometheusHandler(PSUtilMetricsLoader(nbapp)), 1000
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/nbresuse/prometheus.py", line 25, in __init__
Mai 04 08:39:27 jupyterhubvm bash[14907]:         gauge = Gauge(phrase, "counter for " + phrase.replace("_", " "), [])
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/prometheus_client/metrics.py", line 355, in __init__
Mai 04 08:39:27 jupyterhubvm bash[14907]:         super(Gauge, self).__init__(
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/prometheus_client/metrics.py", line 136, in __init__
Mai 04 08:39:27 jupyterhubvm bash[14907]:         registry.register(self)
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/prometheus_client/registry.py", line 29, in register
Mai 04 08:39:27 jupyterhubvm bash[14907]:         raise ValueError(
Mai 04 08:39:27 jupyterhubvm bash[14907]:     ValueError: Duplicated timeseries in CollectorRegistry: {'total_memory_usage'}

Expected behaviour

No crash

Actual behaviour

Crash and restart of server & kernel required Important: It is independent of ram usage, even after fresh reboot.

How to reproduce

Hard to reproduce, just waiting

Your personal set up

Full environment ``` asn1crypto==0.24.0 attrs==17.4.0 Automat==0.6.0 bcrypt==3.2.0 blinker==1.4 cached-property==1.5.2 certifi==2018.1.18 cffi==1.15.0 chardet==3.0.4 charset-normalizer==2.0.9 click==6.7 cloud-init==22.1 colorama==0.3.7 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 cryptography==36.0.0 distro==1.6.0 distro-info===0.18ubuntu0.18.04.1 docker==5.0.3 docker-compose==1.29.2 dockerpty==0.4.1 docopt==0.6.2 httplib2==0.9.2 hyperlink==17.3.1 idna==2.6 incremental==16.10.1 iotop==0.6 Jinja2==2.10 jsonpatch==1.16 jsonpointer==1.10 jsonschema==2.6.0 keyring==10.6.0 keyrings.alt==3.0 language-selector==0.1 MarkupSafe==1.0 netifaces==0.10.4 numpy==1.19.5 oauthlib==2.0.6 PAM==0.4.2 paramiko==2.8.1 pexpect==4.2.1 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycparser==2.21 pycrypto==2.6.1 PyGObject==3.26.1 PyJWT==1.5.3 PyNaCl==1.4.0 pyOpenSSL==17.5.0 pyserial==3.4 python-apt==1.6.5+ubuntu0.7 python-debian==0.1.32 python-dotenv==0.19.2 pyxdg==0.25 PyYAML==3.12 requests==2.26.0 requests-unixsocket==0.1.5 SecretStorage==2.3.1 semantic-version==2.8.5 service-identity==16.0.0 six==1.11.0 sos==4.3 ssh-import-id==5.7 systemd-python==234 texttable==1.6.4 Twisted==17.9.0 typing_extensions==4.0.1 ubuntu-advantage-tools==27.7 ufw==0.36 unattended-upgrades==0.1 urllib3==1.22 websocket-client==0.59.0 zope.interface==4.3.2 ```
Configuration ```python users: admin: - agrigor user_environment: default_app: jupyterlab services: configurator: enabled: false auth: type: nativeauthenticator.NativeAuthenticator NativeAuthenticator: open_signup: true ```
Logs ``` No information in jupyterhub.service, only in jupyterhub-username like written above ```

Update/Correction: Ubuntu Version is 18.04 NOT 20.04, as written before, sry for that.

welcome[bot] commented 2 years ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

Agrigor commented 1 year ago

ping :)

minrk commented 1 year ago

@Agrigor sorry, it's been a busy couple of months! It looks like you have two packages publishing the same metrics: the older nbresuse, and the newer jupyter-resource-usage. I think if you remove nbresuse, you should get what you want.

Agrigor commented 1 year ago

He @minrk, thanks for your answer! I just uninstalled nbresuse, but unfortunately the crashes are still existing all the time ... Do you have any other idea how I can debug or even fix this crash issue? KR

sunway910 commented 1 year ago

@Agrigor sorry, it's been a busy couple of months! It looks like you have two packages publishing the same metrics: the older nbresuse, and the newer jupyter-resource-usage. I think if you remove nbresuse, you should get what you want.

thanks, when i start jupyterlab after uninstall nbresuse,it solved my problem like:ValueError: Duplicated timeseries in CollectorRegistry: {'total_memory_usage'}