Open masaok opened 5 years ago
/metrics is an endpoint to track usage as reported by the started server. Typically, a service like promethus would scrape such endpoints for data, to be presented later by grafana etc.
I'm not sure if anything inside a TLJH deployment should be scraping such endpoint by default. Assuming that isn't the case, perhaps there is something we configure when we start our user servers that makes some external scraping service attempt to scrape them because something we configure?
To understand the logs, this is what happens:
/user/<username>
I'm getting a lot of similar logs on JupyterHub 4.0.2 for the /jupyter/hub/user/<username>/api/kernels/<uuid>
endpoint:
[I 2023-09-27 23:13:22.744 JupyterHub log:191] 302 GET /jupyter/user/<username>/api/kernels/<uuid>?1695849202650 -> /jupyter/hub/user/<username>/api/kernels/<uuid>?1695849202650 (@<ip>) 0.90ms
[W 2023-09-27 23:13:22.754 JupyterHub base:1444] Failing suspected API request to not-running server: /jupyter/hub/user/<username>/api/kernels/<uuid>
[W 2023-09-27 23:13:22.755 JupyterHub log:191] 424 GET /jupyter/hub/user/<username>/api/kernels/<uuid>?1695849202650 (<username>@<ip>) 3.76ms
I'm fairly certain the users don't run any scraper or something, so where does this come from? May it be that the user server was stopped due to inactivity and the client application keeps sending requests over and over, waiting for the user to start the server again?
@lahwaacz yes, most likely is that a user has a JupyterLab tab open after their server has been shutdown. JupyterLab is known to be overzealous at polling the server, especially after it's become unavailable.
The /metrics endpoint is used by jupyter-resource-usage (it's been deprecated quite some time ago in favor of /metrics/v1/...
), so some packages might be out of date).
This particular poll is created here, with a max poll interval unspecified, so it uses the default of 30 seconds. That means that while a server is stopped, you will get these errors every 30 seconds from every browser left open pointing to a not-running server. It should only poll when the window is focused, though, so it's a bit weird that you're seeing so many, since that should suggest there are focused JupyterLab windows open and in the foreground on computers with no running server behind them. If the focus control isn't happening, it may be because things are out of date.
jupyter-resource-metrics could set a longer max poll interval so you get these less often, or it could add some logic to stop polling altogether after too many failures, waiting for a manual restart.
@minrk Can you point to some JupyterLab issues about this behavior?
https://github.com/jupyterlab/jupyterlab/issues/3929 is the main one
Can you share your user and hub environments? I don't think /hub/user/metrics
will be hit at all if packages are up to date, so maybe updating will fix the polling problem as well.
@minrk I'm not getting hits on /hub/user/<user>/metrics
, only /jupyter/hub/user/<username>/api/kernels/<uuid>
.
I run JupyterHub on Arch Linux where the package names and versions are the following (most likely different from pip
or other package managers):
jupyter-bash_kernel 0.9.0-2
jupyter-gnuplot_kernel 0.4.1-6
jupyter-lsp 2.2.0-2
jupyter-metakernel 0.30.1-1
jupyter-nbclient 0.8.0-1
jupyter-nbconvert 7.8.0-1
jupyter-nbformat 5.9.2-1
jupyter-nbgitpuller 1.2.0-2
jupyter-nbgrader-git v0.9.1.r1.g2ef44515-1
jupyter-notebook 7.0.4-1
jupyter-notebook-shim 0.2.3-2
jupyter-server 2.7.3-1
jupyterhub 4.0.2-1
jupyterhub-idle-culler 1.2.1-2
jupyterhub-systemdspawner 1.0.1-1
jupyterlab 4.0.6-1
jupyterlab-pygments 0.2.2-1
jupyterlab-pytutor 0.2.0-2
jupyterlab-rise 0.41.0-1
python-jupyter-client 8.3.1-1
python-jupyter-core 5.3.2-1
python-jupyter-events 0.7.0-1
python-jupyter-server-terminals 0.4.4-4
python-jupyter_telemetry 0.1.0-2
python-jupyterlab-server 2.25.0-1
Sorry, I mixed your report and the OP which reported /metrics.
The kernels list is definitely jupyterlab, though I can't say for certain that it's not in one of the extensions. It's definitely supposed to stop polling when connection is lost, so I think this is a bug in jupyterlab that polling continues when the connection is lost. I can't say for sure where, but it's worth reporting to jupyterlab.
references:
@minrk Thanks, I reported it in https://github.com/jupyterlab/jupyterlab/issues/15211. Sorry for mixing possibly unrelated issue here.
No worries, I think it's the same general issue, just the exact manifestation may change over time.
About 50 student users running on Ubuntu:
I'm getting tons of logs in
/var/log/syslog
that look like this:By "tons", I mean 17K+
503
and302
errors in a day.However, when I log in via a browser and check
/user/studentuser/metrics
, it returns correctly:With a
200
in the logs.I've tried this as an admin user and normal user, and everything looks normal in the logs, so I'm not sure what is causing this error when other users are using the site.