jupyterhub / configurable-http-proxy

node-http-proxy plus a REST API
BSD 3-Clause "New" or "Revised" License
239 stars 128 forks source link

Hundred of users leads to running out of tens of thousands of ephemeral ports #557

Open consideRatio opened 7 hours ago

consideRatio commented 7 hours ago

From https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2359217928 and onwards is context on how a CHP pod can end up running out of ephemeral ports, with a mitigation strategy in https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2362097477.

consideRatio commented 7 hours ago

Based on https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2359227825, I think this may not be an issue with CHP as much as the software running in the user servers leading to a flood of connections be initiated via the UI.

consideRatio commented 7 hours ago

@felder this is a followup to https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2359416947. I inspected two active deployments with 222 and 146 currently active users respectively.

A hub where users access either /tree or /lab

From inspection, it seems this makes use of jupyter_server 2.12.1 and jupyterlab 4.0.9.

This is from a CHP pod with a hub currently having 222 current user pods running the image quay.io/2i2c/utoronto-image:2525722ac1d5, where users may be accessing /tree or /lab and its not clear what distribution of UI usage among those.

/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | grep 8081 | wc -l
80
/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | grep 8888 | wc -l
1416
/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | wc -l
1609
pip list ``` Package Version --------------------------------- ------------ absl-py 2.1.0 affine 2.4.0 aiohttp 3.9.5 aiosignal 1.3.1 alabaster 0.7.16 alembic 1.13.0 altair 5.2.0 annotated-types 0.7.0 anyio 4.1.0 archspec 0.2.2 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 arviz 0.18.0 astropy 5.3.4 astroquery 0.4.7 asttokens 2.4.1 astunparse 1.6.3 async-generator 1.10 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.1.0 Babel 2.13.1 backports.tarfile 1.0.0 beautifulsoup4 4.12.2 bleach 6.1.0 blinker 1.7.0 blis 0.7.10 bokeh 3.3.2 boltons 23.0.0 Bottleneck 1.3.7 branca 0.7.2 Brotli 1.1.0 cached-property 1.5.2 cachetools 5.4.0 catalogue 2.0.10 certifi 2023.11.17 certipy 0.1.3 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 click-plugins 1.1.1 cligj 0.7.2 cloudpathlib 0.16.0 cloudpickle 3.0.0 colorama 0.4.6 comm 0.1.4 conda 23.11.0 conda-libmamba-solver 23.11.1 conda-package-handling 2.2.0 conda_package_streaming 0.9.0 confection 0.1.4 cons 0.4.6 contextily 1.4.0 contourpy 1.2.0 cryptography 41.0.7 cycler 0.12.1 cymem 2.0.8 Cython 3.0.6 cytoolz 0.12.2 dask 2023.12.0 datascience 0.17.6 debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 descartes 1.1.0 dill 0.3.7 distributed 2023.12.0 distro 1.8.0 dm-tree 0.1.8 docutils 0.21.2 entrypoints 0.4 esda 2.5.1 et-xmlfile 1.1.0 etuples 0.3.9 exceptiongroup 1.2.0 executing 2.0.1 fastjsonschema 2.19.0 fastprogress 1.0.3 fica 0.3.1 filelock 3.15.4 fiona 1.9.5 flatbuffers 24.3.25 folium 0.17.0 fonttools 4.46.0 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2023.12.1 galpy 1.9.2 gast 0.6.0 GDAL 3.8.1 geographiclib 2.0 geopandas 0.14.4 geopy 2.4.1 giddy 2.3.5 git-credential-helpers 0.2 gitdb 4.0.11 github3.py 4.0.1 GitPython 3.1.40 gmpy2 2.1.2 google-auth 2.32.0 google-auth-oauthlib 1.2.1 google-pasta 0.2.0 graphviz 0.20.3 greenlet 3.0.1 grpcio 1.64.1 h5netcdf 1.3.0 h5py 3.10.0 html5lib 1.1 idna 3.6 imagecodecs 2023.9.18 imageio 2.31.5 imagesize 1.4.1 importlib-metadata 7.0.0 importlib-resources 6.1.1 ipykernel 6.26.0 ipylab 1.0.0 ipympl 0.9.3 ipython 8.18.1 ipython-genutils 0.2.0 ipywidgets 8.1.1 isoduration 20.11.0 jaraco.classes 3.4.0 jaraco.context 5.3.0 jaraco.functools 4.0.0 jax 0.4.30 jaxlib 0.4.30 jedi 0.19.1 jeepney 0.8.0 Jinja2 3.1.2 joblib 1.3.2 json5 0.9.14 jsonpatch 1.33 jsonpointer 2.4 jsonschema 4.20.0 jsonschema-specifications 2023.11.2 jupyter_client 7.4.9 jupyter-contrib-core 0.4.2 jupyter-contrib-nbextensions 0.7.0 jupyter_core 5.5.0 jupyter-events 0.9.0 jupyter-highlight-selected-word 0.2.0 jupyter-lsp 2.2.1 jupyter_nbextensions_configurator 0.6.4 jupyter-remote-desktop-proxy 1.2.1 jupyter-resource-usage 1.0.2 jupyter_server 2.12.1 jupyter-server-mathjax 0.2.6 jupyter_server_proxy 4.3.0 jupyter_server_terminals 0.4.4 jupyter-telemetry 0.1.0 jupyter-tree-download 1.0.1 jupyterhub 4.0.2 jupyterlab 4.0.9 jupyterlab_git 0.50.0 jupyterlab_pygments 0.3.0 jupyterlab_server 2.25.2 jupyterlab-widgets 3.0.9 jupyterthemes 0.20.0 jupytext 1.15.2 jwcrypto 1.5.6 kaleido 0.2.1 keras 2.15.0 keyring 25.2.1 kiwisolver 1.4.5 langcodes 3.4.0 language_data 1.2.0 lazy_loader 0.3 lesscpy 0.15.1 libclang 18.1.1 libmambapy 1.5.4 libpysal 4.9.2 llvmlite 0.40.1 locket 1.0.0 logical-unification 0.4.6 lxml 5.2.2 lz4 4.3.2 Mako 1.3.0 mamba 1.5.4 mapclassify 2.6.1 marisa-trie 1.1.0 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.3 markus-jupyter-extension 0.1.4 matplotlib 3.8.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.4.1 mdurl 0.1.2 menuinst 2.0.0 mercantile 1.2.1 miniKanren 1.0.3 mistune 3.0.2 ml-dtypes 0.3.2 more-itertools 10.3.0 mpmath 1.3.0 msgpack 1.0.7 multidict 6.0.5 multipledispatch 1.0.0 munkres 1.1.4 murmurhash 1.0.10 nbclassic 1.0.0 nbclient 0.8.0 nbconvert 7.12.0 nbdime 4.0.1 nbformat 5.9.2 nbgitpuller 1.2.1 nest-asyncio 1.5.8 networkx 3.2.1 nltk 3.8.1 notebook 6.5.7 notebook_shim 0.2.3 numba 0.57.1 numexpr 2.8.7 numpy 1.24.4 oauthlib 3.2.2 openpyxl 3.1.2 opt-einsum 3.3.0 otter-grader 5.5.0 overrides 7.4.0 packaging 23.2 pamela 1.1.0 pandas 2.1.3 pandocfilters 1.5.0 parso 0.8.3 partd 1.4.1 patsy 0.5.4 pexpect 4.8.0 pickleshare 0.7.5 Pillow 10.1.0 pip 24.0 pkgutil_resolve_name 1.3.10 platformdirs 4.1.0 plotly 5.22.0 pluggy 1.3.0 ply 3.11 preshed 3.0.9 prometheus-client 0.19.0 prompt-toolkit 3.0.41 protobuf 4.24.4 psutil 5.9.5 ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 9.0.0 pyarrow 14.0.1 pyarrow-hotfix 0.6 pyasn1 0.6.0 pyasn1_modules 0.4.0 pycosat 0.6.6 pycparser 2.21 pycurl 7.45.1 pydantic 2.8.2 pydantic_core 2.20.1 pyerfa 2.0.1.4 Pygments 2.17.2 PyJWT 2.8.0 pymc 5.10.4 pyOpenSSL 23.3.0 pyparsing 3.1.1 pyproj 3.6.1 PySocks 1.7.1 pytensor 2.18.6 python-dateutil 2.8.2 python-json-logger 2.0.7 python-on-whales 0.71.0 pytz 2023.3.post1 pyvo 1.5.2 PyWavelets 1.4.1 PyYAML 6.0.1 pyzmq 25.1.2 quantecon 0.7.2 rasterio 1.3.9 redis 5.0.7 referencing 0.32.0 regex 2024.5.15 requests 2.31.0 requests-oauthlib 2.0.0 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.7.1 rise 5.7.1 rpds-py 0.13.2 rsa 4.9 Rtree 1.3.0 ruamel.yaml 0.18.5 ruamel.yaml.clib 0.2.7 scikit-image 0.22.0 scikit-learn 1.3.2 SciPy 1.11.4 seaborn 0.13.0 SecretStorage 3.3.3 Send2Trash 1.8.2 setuptools 68.2.2 shapely 2.0.4 shellingham 1.5.4 simpervisor 1.0.0 six 1.16.0 smart-open 6.4.0 smmap 5.0.0 sniffio 1.3.0 snowballstemmer 2.2.0 snuggs 1.4.7 sortedcontainers 2.4.0 soupsieve 2.5 spacy 3.7.4 spacy-legacy 3.0.12 spacy-loggers 1.0.5 Sphinx 7.4.4 sphinxcontrib-applehelp 1.0.8 sphinxcontrib-devhelp 1.0.6 sphinxcontrib-htmlhelp 2.0.5 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.7 sphinxcontrib-serializinghtml 1.1.10 splot 1.1.5.post1 spreg 1.5.0 SQLAlchemy 2.0.23 srsly 2.4.8 stack-data 0.6.2 statsmodels 0.14.0 sympy 1.12 tables 3.9.2 tblib 2.0.0 tenacity 8.5.0 tensorboard 2.15.2 tensorboard-data-server 0.7.2 tensorflow 2.15.1 tensorflow-estimator 2.15.0 tensorflow-io-gcs-filesystem 0.37.1 tensorflow-probability 0.23.0 termcolor 2.4.0 terminado 0.18.0 textblob 0.17.1 thinc 8.2.5 threadpoolctl 3.2.0 tifffile 2023.9.26 tinycss2 1.2.1 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 tornado 6.3.3 tqdm 4.66.1 traitlets 5.14.0 truststore 0.8.0 typer 0.9.4 types-python-dateutil 2.8.19.14 typing_extensions 4.8.0 typing-utils 0.1.0 tzdata 2023.3 uri-template 1.3.0 uritemplate 4.1.1 urllib3 2.1.0 wasabi 1.1.2 wcwidth 0.2.12 weasel 0.3.4 webcolors 1.13 webencodings 0.5.1 websocket-client 1.7.0 websockify 0.12.0 Werkzeug 3.0.3 wheel 0.42.0 widgetsnbextension 4.0.9 wrapt 1.14.1 xarray 2024.6.0 xarray-einstats 0.7.0 xlrd 2.0.1 xyzservices 2023.10.1 yarl 1.9.4 zict 3.0.0 zipp 3.17.0 zstandard 0.22.0 ```

A hub where users access /rstudio

This is from a CHP pod with a hub currently having 146 current user pods running the image quay.io/2i2c/utoronto-r-image:5e7aea3c30ff, where users are accessing /rstudio.

/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | grep 8081 | wc -l
5
/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | grep 8888 | wc -l
1164
/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED| wc -l
1250

From inspection, it seems this makes use of jupyter-server 1.24.0 together with rstudio stuff in the frontend.

pip list ``` Package Version ----------------------------- --------------- aiohttp 3.9.3 aiosignal 1.3.1 alabaster 0.7.16 alembic 1.13.1 annotated-types 0.6.0 anyio 3.7.1 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asttokens 2.4.1 astunparse 1.6.3 async-generator 1.10 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.2.0 Babel 2.14.0 beautifulsoup4 4.12.3 bleach 6.1.0 certifi 2024.2.2 certipy 0.1.3 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 comm 0.2.2 cryptography 42.0.5 debugpy 1.8.1 decorator 5.1.1 defusedxml 0.7.1 dill 0.3.8 docutils 0.20.1 entrypoints 0.4 exceptiongroup 1.2.0 executing 2.0.1 fastjsonschema 2.19.1 fica 0.3.1 fqdn 1.5.1 frozenlist 1.4.1 git-credential-helpers 0.2 github3.py 4.0.1 greenlet 3.0.3 h11 0.14.0 httpcore 1.0.4 httpx 0.27.0 idna 3.6 imagesize 1.4.1 ipykernel 6.29.3 ipylab 1.0.0 ipython 8.22.2 ipython-genutils 0.2.0 ipywidgets 8.1.2 isoduration 20.11.0 jedi 0.19.1 Jinja2 3.1.3 json5 0.9.22 jsonpointer 2.4 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 jupyter_client 7.4.9 jupyter_core 5.7.2 jupyter-events 0.9.1 jupyter-lsp 2.2.4 jupyter-resource-usage 0.7.2 jupyter-rsession-proxy 2.2.0 jupyter-server 1.24.0 jupyter_server_proxy 4.1.1 jupyter_server_terminals 0.5.3 jupyter-shiny-proxy 1.1 jupyter-telemetry 0.1.0 jupyterhub 4.0.2 jupyterlab 3.4.8 jupyterlab_pygments 0.3.0 jupyterlab_server 2.25.4 jupyterlab_widgets 3.0.10 jupytext 1.16.1 Mako 1.3.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib-inline 0.1.6 mdit-py-plugins 0.4.0 mdurl 0.1.2 mistune 3.0.2 multidict 6.0.5 nbclassic 0.5.6 nbclient 0.10.0 nbconvert 7.16.2 nbformat 5.10.2 nbgitpuller 1.2.0 nest-asyncio 1.6.0 notebook 6.5.6 notebook_shim 0.2.4 numpy 1.26.4 oauthlib 3.2.2 otter-grader 5.2.2 overrides 7.7.0 packaging 24.0 pamela 1.1.0 pandas 2.2.1 pandocfilters 1.5.1 parso 0.8.3 pexpect 4.9.0 pip 24.0 platformdirs 4.2.0 prometheus_client 0.20.0 prompt-toolkit 3.0.43 psutil 5.9.8 ptyprocess 0.7.0 pure-eval 0.2.2 pycparser 2.21 pydantic 2.6.4 pydantic_core 2.16.3 Pygments 2.17.2 PyJWT 2.8.0 pyOpenSSL 24.1.0 python-dateutil 2.9.0.post0 python-json-logger 2.0.7 python-on-whales 0.70.0 pytz 2024.1 PyYAML 6.0.1 pyzmq 24.0.1 referencing 0.33.0 requests 2.31.0 retrolab 0.3.21 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rpds-py 0.18.0 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 Send2Trash 1.8.2 setuptools 59.6.0 simpervisor 1.0.0 six 1.16.0 sniffio 1.3.1 snowballstemmer 2.2.0 soupsieve 2.5 Sphinx 7.2.6 sphinxcontrib-applehelp 1.0.8 sphinxcontrib-devhelp 1.0.6 sphinxcontrib-htmlhelp 2.0.5 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.7 sphinxcontrib-serializinghtml 1.1.10 SQLAlchemy 2.0.28 stack-data 0.6.3 terminado 0.18.1 tinycss2 1.2.1 toml 0.10.2 tomli 2.0.1 tornado 6.4 tqdm 4.66.2 traitlets 5.14.2 typer 0.9.0 types-python-dateutil 2.8.19.20240311 typing_extensions 4.10.0 tzdata 2024.1 uri-template 1.3.0 uritemplate 4.1.1 urllib3 2.2.1 wcwidth 0.2.13 webcolors 1.13 webencodings 0.5.1 websocket-client 1.7.0 wheel 0.43.0 widgetsnbextension 4.0.10 wrapt 1.16.0 yarl 1.9.4 ```
manics commented 6 hours ago

This doesn't rule out CHP- to do that you'd need to compare this with another proxy like Traefik. For example, if CHP isn't closing connections as fast as the browser this could lead to too many ports in use.

Do the existing CHP tests cover HTTP persistent connections? https://en.m.wikipedia.org/wiki/HTTP_persistent_connection

felder commented 6 hours ago

One thing I'm noticing as I investigate is that user servers that use lab (as opposed to rsession-proxy or the like) interact with the hub pod a lot more often. Anytime I interact with the file browser, launcher, etc last_activity for the hub pod route in chp updates. This is not the case if /rstudio is designated as the default URL.

Additionally the ESTABLISHED connection count to hubip:8081 with a single user pod running lab (as opposed to rstudio) increments pretty steadily as I do things like kill the pod, kill the kernel, refresh the browser, etc.

shaneknapp commented 6 hours ago

This doesn't rule out CHP- to do that you'd need to compare this with another proxy like Traefik. For example, if CHP isn't closing connections as fast as the browser this could lead to too many ports in use.

i believe this might be happening... if a user closes their laptop, or opens their notebook in a new browser (which happens more often than you'd imagine) we see a lot of spam (hundreds of 503s being reported) in the proxy logs:

21:08:06.483 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.491 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.514 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.533 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.536 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.561 [ConfigProxy] info: Removing route /user/<hub user>
21:08:06.561 [ConfigProxy] info: 204 DELETE /api/routes/user/<hub user>
21:08:15.521 [ConfigProxy] info: Adding route /user/<hub user> -> http://10.28.26.176:8888
21:08:15.521 [ConfigProxy] info: Route added /user/<hub user> -> http://10.28.26.176:8888
21:08:15.521 [ConfigProxy] info: 201 POST /api/routes/user/<hub user>
21:08:18.845 [ConfigProxy] info: 200 GET /api/routes
consideRatio commented 5 hours ago

Hmmm, so we have a spam of 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888, where something (jupyterlab in browser?) tries to access a user server, but the proxying fails with connection refused - perhaps because the server is shutting down or similar.

After that, jupyterhub asks CHP to delete the route.

After that, I expect the thing that got 503 now won't get 503 responses because the proxy pod won't try to proxy to the route any more, instead it will do something else --- maybe redirect to the hub pod as a default route - which then gets spammed.

@shaneknapp I guess that we can see some redirects with debug logging or similarly - or can we see redirect responses from CHP already and we aren't seeing them?

consideRatio commented 4 hours ago

I think /api/events/subscribe are associated with websockets, an endpoint added in jupyter_server 2.0.0a2. Is something related to jupyterlab's browser side code re-trying excessively against that when failing?

From the logs i see one failed request every ~10ms five times in a row, which I guess means no delay between re-attempts etc.

21:08:06.483
21:08:06.491
21:08:06.514
21:08:06.533
21:08:06.536

@minrk I recall that you submitted a PR somewhere, sometime a while back, about excessive connections or retries. Was this to this endpoint?

felder commented 4 hours ago

So when running lab, when I do things like kill my pod or start up another connection from another tab or browser I tend to be able to get chp to emit 503 messages similar to:

23:48:41.600 [ConfigProxy] error: 503 GET /user/felder/terminals/websocket/1 connect ETIMEDOUT 10.28.35.109:8888
23:48:49.793 [ConfigProxy] error: 503 GET /user/felder/api/events/subscribe connect ETIMEDOUT 10.28.35.109:8888
...
00:01:16.903 [ConfigProxy] error: 503 GET /user/felder/api/kernels/d9472c13-5a55-47cf-a569-ed981f709bbf/channels connect ECONNREFUSED 10.28.8.3:8888
00:01:16.905 [ConfigProxy] error: 503 GET /user/felder/api/kernels/d9472c13-5a55-47cf-a569-ed981f709bbf/channels connect ECONNREFUSED 10.28.8.3:8888
00:01:16.907 [ConfigProxy] error: 503 GET /user/felder/api/kernels/d9472c13-5a55-47cf-a569-ed981f709bbf/channels connect ECONNREFUSED 10.28.8.3:8888
00:01:16.974 [ConfigProxy] error: 503 GET /user/felder/api/kernels/d9472c13-5a55-47cf-a569-ed981f709bbf connect ECONNREFUSED 10.28.8.3:8888

This does make sense when I'm killing my user pod since the server is no longer there at that ip.

However, when this happens I see a correlated increase in the number of established connections from chp->hub:8081. Those connections seem to persist.

felder commented 3 hours ago

Noting that if I delete the route to the hub pod in chp, the connections still persist.