Open amangarg96 opened 6 years ago
At the moment, when running in YARN Cluster Mode, a package needs to be pip installed
from a notebook cell, or available in all worker nodes. There are a few approaches to work around this issue, directory mountings such as NSF or object store, anaconda enterprise, or regular update scripts to sync locally installed environments to remote workers.
pip install
from notebook cell seems to be the most suitable solution for my use case, which should work if the python package is stored in HDFS (so that it's available to all worker nodes).
This would be a temporary solution, as users of Notebooks should be kept away from interacting with HDFS.
I'm thinking of going with a ContentsManager implementation, like HDFSContents and S3Contents, using which the full project (along with the python packages) would be available in all worker nodes and user can conveniently do pip install
Are there any suggestions on this?
@amangarg96 - Any updates on this?
@kevin-bates I have not found the time to do the proper solutioning for this, but one hacky way to do this from Notebooks is to use IPython's built-in magic commands.
User can send the custom package to the IPython kernel's container by using %%writefile, where the user can paste the contents of a file (from local file system) in a Notebook cell, with %%writefile magic command at top to create a file on the container.
Use system calls (!) to build the packages.
I am using Jupyter Enterprise Gateway to run iPython kernels in YARN Cluster Mode on Apache Spark. The Jupyter Lab server is running on my local machine (Macbook), while the Jupyter Enterprise Gateway server is running on one of the Nodes of the cluster, while kernels are launched on the cluster.
Is there a way to import custom python packages which are made on the notebook server machine? For instance, if a user has a full project that he is working on, which contains some python packages that he has made. How does he import them?