Open ArindamHalder7 opened 5 years ago
There needs to be a way to isolate the user to the kernel's "reach".
In on-prem environments, this is typically accomplished via the permissions - where kerberos is used to perform impersonation via the KERNEL_USERNAME
value. These tend to be Hadoop/YARN envs so HDFS is the preferred mechanism for make files available to the kernels.
In container-based environments, this isolation is accomplished via containerized kernels - coupled with user-specific mounts. In this case, you'd want to run EG in docker (or docker swarm) and each kernel container would mount user-specific volumes to the kernel container that are also in use by the docker notebook instance. This would likely require modification of the docker launcher to include the user-specific mounts.
The configuration you describe is kind of a hybrid approach. Since you want to use Spark, then I recommend you use Kubernetes for complete containerization since it supports Spark 2.4. If Kubernetes is not an option then you might try using YARN w/ kerberos, although I'm not knowledgable enough to tell you if HDFS can be accessed from docker containers for the file sharing you need.
cc: @lresende @akchinSTC
Hi @lresende
I am creating new issue on continuation of the discussion #670 .
Here is my scenario.
How user (Notebook container) can work with own i/p and o/p files with EG kernel (cluster/non cluster) configured in different system/VM. Also how can i restrict access of user specific files between multiple users in the same scenario.
One more query: If any Notebook container tries to install any new python package with "conda install" or "pip install" , will this package get installed in all the nodes of the clustered KG? Although I have not checked this till now, but have plan to test this in later point of time.
Let me know if you more information on this.