Exchange user specific files between EG kernel and Notebook container, and restrict access of files between Notebook containers.

jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Other

623 stars 222 forks source link

Hi @lresende

I am creating new issue on continuation of the discussion #670 .

Here is my scenario.

I am running EG 1.2 and Spark 2.3.1.
I want to run multiple Notebook container in different system/VM which will connect to EG using websocket.
EG kernel are running on host, Not of docker.
User Notebook containers have i/p files. These i/p files will be used in the program written by the user. User also will generate some o/p files which will be accessed in the users own Notebook container.

How user (Notebook container) can work with own i/p and o/p files with EG kernel (cluster/non cluster) configured in different system/VM. Also how can i restrict access of user specific files between multiple users in the same scenario.

One more query: If any Notebook container tries to install any new python package with "conda install" or "pip install" , will this package get installed in all the nodes of the clustered KG? Although I have not checked this till now, but have plan to test this in later point of time.

Let me know if you more information on this.

There needs to be a way to isolate the user to the kernel's "reach".

In on-prem environments, this is typically accomplished via the permissions - where kerberos is used to perform impersonation via the KERNEL_USERNAME value. These tend to be Hadoop/YARN envs so HDFS is the preferred mechanism for make files available to the kernels.

In container-based environments, this isolation is accomplished via containerized kernels - coupled with user-specific mounts. In this case, you'd want to run EG in docker (or docker swarm) and each kernel container would mount user-specific volumes to the kernel container that are also in use by the docker notebook instance. This would likely require modification of the docker launcher to include the user-specific mounts.

The configuration you describe is kind of a hybrid approach. Since you want to use Spark, then I recommend you use Kubernetes for complete containerization since it supports Spark 2.4. If Kubernetes is not an option then you might try using YARN w/ kerberos, although I'm not knowledgable enough to tell you if HDFS can be accessed from docker containers for the file sharing you need.

cc: @lresende @akchinSTC

jupyter-server / enterprise_gateway

Exchange user specific files between EG kernel and Notebook container, and restrict access of files between Notebook containers. #676