jupyter / jupyter_client

Jupyter protocol client APIs
https://jupyter-client.readthedocs.io
BSD 3-Clause "New" or "Revised" License
374 stars 279 forks source link

Exposing kernel contents (file system) via comms #1006

Open krassowski opened 6 months ago

krassowski commented 6 months ago

During jupyter-server meeting we discussed the possibility of enabling the frontends to ask for and receive the content from the kernel. This is distinct from the current contents manager APIs in jupyter-server which gets and puts content in the jupyter server root.

For context, this sprout out of a discussion on file ownership/path resolution endpoint proposed for jupyter server, with motivating use case for the frontend to decide which API to hit to get a source file when user clicks on a file name to open the file (e.g. in a traceback):

There were two concerns raised for the above proposal:

Previously, another proposal was raised on sidelines of jupyter-server team conversation, involving exposing kernel contents via comms. In that proposal a ContentManager-like API would be optionally implemented by the kernel and available to the client by comms. This was proposed for ipylab by @bollwyvl:

It is not clear to me how this should be implemented, both on code level nor on architectural level, so I would greatly appreciate more thoughts on how it could/should function. CC @Zsailer.

krassowski commented 6 months ago

The opposite problem (accessing files of a drive from kernel) was also explored in jupyterlab, I think it might be relevant to mention:

linlol commented 4 months ago

Hi @krassowski, for the issue you mentioned in https://github.com/jupyter-server/jupyter_server/issues/1280

This is the case even if frontend knows what the root_dir is. For example if root_dir is /home/my-username/server_root, the frontend does not know what is the expansion of ~ in the kernel space (it may well be /home/another-username/).

Is it applicable if we always flatten ~ to real absolute path? since user can easily get the value from frontend, and root_dir shall be immutable since server startup

linlol commented 4 months ago

Thanks @krassowski and @Zsailer,

This issue is definitely worthy a further discussion, let me briefly share our use case.

Our case

In our case, we build Jupyter based on docker image and deploy it via https://github.com/jupyterhub/zero-to-jupyterhub-k8s in kubernetes.

Therefore, lib/internal code base installed when building image, while root_dir is configured as external volume mount. (since we want user to have independent and persistent filesystem to store their own code/data/notebook)

This design shall be quite common across investment bankings and other similar institution (e.g. hf). Since almost python-based quant-analytics team would use Jupyter with mounting dedicated filesystem to each user (and setup root_dir to that mounted volume). Public solutions such as Google Colabr shall also be similar. (Just guess, please correct me if I am wrong)

In this design, root-dir could never coincide with where we store code, which makes the new feature https://github.com/jupyterlab/jupyterlab/pull/13390 introduced in 4.1 unavailable. Also, linking codes to root_dir is not acceptable since it would significantly downgrade the overall performance to create symbolic link across file systems.

Proposal

Add a flag, in terms of traitlet configuration, which let Jupyter-server know whether or not shall it search and expose files out of root_dir among original file disposing API.

(sorry that I might think this question on a limited scope, in my perspective, we are extending an existing feature, thus, make a few enhancement and configuration on original feature is enough)

krassowski commented 4 months ago

Thank you @linlol for nudging this!

Of note, kernels with a debuggers already have a way to pass the contents of the file, wherever the user space is mounted. In such a setup the only thing that we need for https://github.com/jupyterlab/jupyterlab/pull/13390 to work is to get info who owns the file (and what is its full path). This is why the original proposal was along the lines of: