Closed zhiyuli closed 3 years ago
Hi @zhiyuli.
It's tough to determine what you've tried but, generally speaking, the idea is that the launch_docker.py
located in each kernelspec's scripts
directory contains the necessary information. Currently, we build a mounts
list to be added as an argument (currently commented out in the script) for the services creation. This particular example adds the kernelspec directory tree as a mount point.
Since you likely want to mount user HOME directories or directory names based on the user, there are a few ways to inject user-specific values. With EG, any environment variables set on the client notebook server (not the EG server) will be propagated through EG to the kernel launch script (and launch_docker.py
). The most common example, and perhaps most useful in this scenario, is KERNEL_USERNAME
. As a result, if your user HOME directories all fit the same pattern (e.g., /home/{username}), then you could use syntax like kernel_user = param_env.get('KERNEL_USENAME')
and substitute that value into a string representing the user's home directory (f"/home/{kenrel_user}"
).
Or, if the home directory can be determined within the notebook configuration, set a different env (e.g., KERNEL_USER_HOME
) that contains the entire value, then access that env within launch_docker.py
.
Unfortunately, Swarm hasn't had the attention that Kubernetes has had so I'm not sure how many others might be able to help. On that note, I'm cc-ing @mattjtodd since they helped out on Docker compose stuff and might have some useful information.
@kevin-bates Thanks! That was very helpful. I was able to mount dynamic user folder to kernel container now.
I am having another issue: we have notebooks that reference data by relative path. Say a notebook is coded to open a file at "./data.csv". It works when notebook and data.csv are both in "workdir" (root of notebook server workspace). But if they are in a subfolder, notebook can not find data.csv. So when user navigates to a subfolder in notebook UI, the "current_dir" in kernel container is not synced.
I tried "EG_MIRROR_WORKING_DIRS=True" on EG container, but it didn't help.
Thanks
Cool. I'm glad you're moving forward. EG_MIRROR_WORKING_DIRS enables EG to flow KERNEL_WORKING_DIR such that the working directory will be changed to a directory, presumably in a mounted volume. You'll need to "line things up" relative to the env's value and the mount point. Here's a link to the env config section in the docs that includes working dir vars: https://jupyter-enterprise-gateway.readthedocs.io/en/latest/config-options.html#per-kernel-environment-overrides
@kevin-bates Thanks for the info. I looked at doc and some related previous issues. But I still don't fully understand how the KERNEL_WORKING_DIR works. As far as I can tell, the EG_MIRROR_WORKING_DIRS flag decides whether to keep KERNEL_WORKING_DIR in env and pass it to launch_docker.py. If it exists, the launch_docker.py set s kwargs['workdir'] = param_env.get('KERNEL_WORKING_DIR') before spawning kernel container. The KERNEL_WORKING_DIR should be set on notebook server container. As a user navigates through different subfolders in notebook UI and opens notebooks, the KERNEL_WORKING_DIR should change and always references user's current path, right? Should I catch some event callbacks in hub container or notebook container to retrieve user's current path and update KERNEL_WORKING_DIR? Thanks
KERNEL_WORKING_DIR
should essentially contain the same value as notebook-dir
. I.e., it should specify the root of where the notebook server looks for notebook files. So, assuming you're mounting user home directories to your notebook server container, those same directories should be mounted in your kernel container and KERNEL_WORKING_DIR
set to the appropriate absolute path into that mounted volume that corresponds to the notebook-dir
value.
So all that the launcher should be using KERNEL_WORKING_DIR
for is the value for WORKDIR when launching the container. Just as if you ran jupyter notebook
from your home directory.
@kevin-bates I think I have done what you said above but it didn't fix my problem.
We are using the official "jupyter/minimal-notebook" image for notebook server. In hub configuration (jupyterhub_config.py) we set c.Spawner.notebook_dir = '/home/jovyan/work', and mount user's folder, say(/NFS_DRIVE/{USERNAME}), to '/home/jovyan/work' inside notebook container. Also, the same user folder is mounted to kernel container (/NFS_DRIVE/{USERNAME} --> /home/jovyan/work). KERNEL_WORKING_DIR is set to "/home/jovyan/work" (same as notebook_dir) on notebook container (through c.Spawner.environment={..} in jupyterhub_config.py).
But currently when user navigates to a subfolder in notebook UI and opens a notebook, say /home/jovyan/work/notebooks/test.ipynb, which references a local file "./data.csv" (alongside test.ipynb). the kernal cant find the data file as the current path in kernel is still "/home/jovyan/work", not "/home/joyan/work/notebooks".
That is why I was thinking value of KERNEL_WORKING_DIR should be a dynamic value and always reference user's current path as user navigates through different folders in notebook UI.
Thanks
I think I may have forgotten how this works, I apologize.
Looking at the applicable code in Notebook, you're right. KERNEL_WORKING_DIR
is a system-defined value in that the notebook server will set it to the notebook path if it's not already specified. As a result, I would recommend you remove its configuration in c.Spawner.environment
and let the notebook add it to the payload. The question then becomes: _Is path
set on the start_kernel()
request?_. It seems as though it should be.
Sorry for the misinformation - but I think we're getting closer here. The kernel-working-dir will be theoretically unique for each notebook. As a result, the container's WORKDIR
should always reside within the mounted volume.
@kevin-bates Thanks very much for your help! I removed KERNEL_WORKING_DIR from c.Spawner.environment. It seems to work now. Thanks!
Help us improve the Jupyter Enterprise Gateway project by reporting issues or asking questions.
Description
We have a jupyterhub deploy on a DockerSwarm cluster. For each single-user notebook container, we mount a NFS folder into it for user storage(notebooks and data). Now we are looking into enterprise gateway, and we were able to configure our hub to spawn single-user notebook containers linked to a gateway container on Swarm. But since notebook server and kernels are now running in different containers, and user storage is only mounted to notebook container, I am wondering if this separation between data and kernels would cause any issue? Do we have to mount user storage into kernel containers as well? if so, is there an example on how to do it on Swarm? Thanks very much
Screenshots / Logs
If applicable, add screenshots and/or logs to help explain your problem. To generate better logs, please run the gateway with
--debug
command line parameter.Environment