Improving usage/detection of URL prefix in deployed environments

banesullivan commented 1 year ago

TL;DR

Is there a better way to use jupyter-server-proxy / know the proxy URL in deployed environments such as MyBinder, JupyterHub, and SageMaker?

Users on a handful of projects where I leverage jupyter-server-proxy struggle to get the correct URL prefix in managed deployed environments; often it's just '/{JUPYTERHUB_SERVICE_PREFIX}/proxy/<port> but other times it can be more nuanced.

Are there any options or existing efforts to alleviate this pain point for users?

Use case

I've been leveraging jupyter-server-proxy on a handful of projects in a similar fashion, which, may be different from the use cases jupyter-server-proxy is designed for. I'd appreciate clarification on if this usage scenario is well-supported and if there are ways to improve my usage of jupyter-server-proxy for these use cases to relieve configuration pain for our users.

In each of these projects, we spin up one or many web servers within the Python kernel (at runtime) to then access from the client-side Jupyter notebook. Two examples:

Serving tiles for large images to visualize interactively on a map. The user launches a tile server (with their configurations defined at runtime) for a given "image" instance in their kernel. We then combine Jupyter widgets like ipyleaflet with the REST tile endpoint we've launched on an arbitrary port on the host machine. We need the client browser to access that server to fetch the tiles. See https://github.com/girder/large_image/pull/1065 and https://github.com/banesullivan/localtileserver

Custom Jupyter "widgets" as an iFrame. In PyVista, we serve a single web application that links to individual 3D plots the user creates in their kernel. We need full access to the port this application runs on and display the application as an iFrame in the Jupyter Notebook so to the user it's just like any other widget. This web app must be launched within the same thread as the user's Python kernel's main thread. More details here and here

Screen Shot 2023-01-31 at 12 02 34 PM

The primary constraint shared in these use cases is that the web server must be a part of the user's Python kernel process: we cannot launch the server ahead of time, and we cannot share servers across Python sessions (notebooks).

Using jupyter-server-proxy has gone excellently for these use cases with only minor issues around end-user configuration and request timeouts -- which is what I hope to optimize and solicit as improved support for in jupyter-server-proxy.

Help!

Are these use cases well-supported by jupyter-server-proxy? I'm asking because the documentation focuses on launching a single standalone web application on the Jupyter Server rather than this launching of arbitrary servers from within a user's Python session.
Are there any ways to improve knowing the URL prefix ahead of time in deployed environments (like MyBinder or SageMaker) to prevent users from having to debug different prefixes in /<prefix>/proxy/<port> for these use cases?

Common prefix scenarios

MyBinder: f"{os.environ['JUPYTERHUB_SERVICE_PREFIX']}/proxy/{{port}}"
SageMaker: f"studiolab/default/jupyter/proxy/{{port}}"
Colab: actually, jupyter-server-proxy isn't needed for these use cases as localhost is remapped by Colab (reference)

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

giswqs commented 1 year ago

It would be nice to have a solution that works across platforms, such as MyBinder, Google Colab, JupyterHub, Amazon SageMaker, Microsoft Planetary Computer.

manics commented 1 year ago

JUPYTERHUB_SERVICE_PREFIX is the standard way to get the base prefix with JupyterHub. As far as I know SageMaker and Colab aren't open-source, so I don't think there's much we can do. However if there's a "well known" standard way to get the prefix we could consider supporting it. Are you aware of any documention?

If not, it might be better to move this discussion to the Jupyter Community Forum https://discourse.jupyter.org/ where people from across the Jupyter ecosystem hang out, since they may have run into this issue with other extensions.

ryanlovett commented 1 year ago

Are these use cases well-supported by jupyter-server-proxy? I'm asking because the documentation focuses on launching a single standalone web application on the Jupyter Server rather than this launching of arbitrary servers from within a user's Python session.

Perhaps it is not well-supported right now, because that hasn't been the focus, but that doesn't mean it couldn't be. I'll add some initial thoughts below, but as @manics said it might be worthwhile to discuss on discourse.

Are there any ways to improve knowing the URL prefix ahead of time in deployed environments (like MyBinder or SageMaker) to prevent users from having to debug different prefixes in //proxy/ for these use cases?

The only way to know the path_info of the managed process (the part after the path to the user's server) in advance is with a named server proxy like those for RStudio, VSCode, noVNC, etc., although these are for single instances. These are made available via entry points where the configuration is known in advance. Right now all existing entry points are loaded when the extension is loaded.

I imagine that there could be a jupyter-server-proxy HTTP API to either create new named services, or to reload any new entry points that have been created since the extension was loaded. This would enable the user to run some code to start a proxied service with an endpoint of their choosing.

banesullivan commented 1 year ago

@manics and @ryanlovett, thank you both for this initial feedback! I'll try to re-pose some of this discussion on Discourse.

I imagine that there could be a jupyter-server-proxy HTTP API to either create new named services, or to reload any new entry points that have been created since the extension was loaded. This would enable the user to run some code to start a proxied service with an endpoint of their choosing.

This is a neat idea. I suppose I am confused about why something must be a new named service to know the path_info... is this not available for the main /proxy/<port> service?

manics commented 1 year ago

The path info (X-Forwarded-Context and X-Proxycontextpath) is already passed, see the tests: https://github.com/jupyterhub/jupyter-server-proxy/blob/98f8f0c0a7c442515d297a72a525d7a6d9b2f350/tests/test_proxies.py#L205-L224

jupyterhub / jupyter-server-proxy