jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
623 stars 222 forks source link

Importing custom python packages when launching kernels on Spark #488

Open amangarg96 opened 6 years ago

amangarg96 commented 6 years ago

I am using Jupyter Enterprise Gateway to run iPython kernels in YARN Cluster Mode on Apache Spark. The Jupyter Lab server is running on my local machine (Macbook), while the Jupyter Enterprise Gateway server is running on one of the Nodes of the cluster, while kernels are launched on the cluster.

Is there a way to import custom python packages which are made on the notebook server machine? For instance, if a user has a full project that he is working on, which contains some python packages that he has made. How does he import them?

lresende commented 6 years ago

At the moment, when running in YARN Cluster Mode, a package needs to be pip installed from a notebook cell, or available in all worker nodes. There are a few approaches to work around this issue, directory mountings such as NSF or object store, anaconda enterprise, or regular update scripts to sync locally installed environments to remote workers.

amangarg96 commented 6 years ago

pip install from notebook cell seems to be the most suitable solution for my use case, which should work if the python package is stored in HDFS (so that it's available to all worker nodes). This would be a temporary solution, as users of Notebooks should be kept away from interacting with HDFS.

I'm thinking of going with a ContentsManager implementation, like HDFSContents and S3Contents, using which the full project (along with the python packages) would be available in all worker nodes and user can conveniently do pip install

Are there any suggestions on this?

kevin-bates commented 5 years ago

@amangarg96 - Any updates on this?

amangarg96 commented 5 years ago

@kevin-bates I have not found the time to do the proper solutioning for this, but one hacky way to do this from Notebooks is to use IPython's built-in magic commands.

User can send the custom package to the IPython kernel's container by using %%writefile, where the user can paste the contents of a file (from local file system) in a Notebook cell, with %%writefile magic command at top to create a file on the container.

Use system calls (!) to build the packages.