pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
http://pangeo.io
703 stars 189 forks source link

Shared environments between Dask workers and Jupyter pod #272

Closed mrocklin closed 5 years ago

mrocklin commented 6 years ago

Many of the challenges we have today are about how best to update user software environments, support multiple software environments, etc..

On various video calls we have also discussed removing this concern by giving user's more control over their environment, which would offload the burden from core maintainers.

There are a few challenges to this:

  1. Workers need to have the same software environment as is in the Jupyter Pod. People have proposed solving this by mounting the user's drive read-only on all of the workers
  2. Users need to install software in user space that won't be destroyed or overwritten when they get a new pod. Presumably on startup we need to install miniconda in the user's directory, rather than directly within the docker image (which would also have the nice effect of reducing the size of our images)

cc @yuvipanda and @jacobtomlinson about point 1

rabernat commented 6 years ago

What if we were able to use the same docker image for both notebook and worker? This would simplify several things. The images are already similar in size. Presumably there would just need to be some environment variable to tell the pod whether to run the notebook startup script or the worker startup script.

tjcrone commented 6 years ago

I spent a long while trying to get this going on Azure, because "Azure File", which is SMB, allows ReadWriteMany access. I pretty much had it working, but because Azure File is SMB it doesn't do Unix permissions properly, and I gave up. And as far as I can tell, it is not possible to easily mount a drive with one pod as read-write, while allowing other pods read-only access. I believe it can likely be done using an NFS volume, but from what I can tell this might be fairly complicated. @yuvipanda is the expert's expert on this, and has many issues/threads discussing the use of NFS persistent volumes. If you want to have the workers share the same home directory as the notebook, I think getting an NFS solution figured out is probably the best route.

I support moving to a system where the notebook and worker use the same image.

mrocklin commented 6 years ago

To be clear using the same image only gets us a little bit of the way there. Their environment will be the same when they first start their first session. Any change they make will not be covered. In this issue I'm actually suggesting that, if we can get them to share a file system then we remove the conda environment from the docker image completely. I want us to get out of the game of determining what versions of software users run.

guillaumeeb commented 6 years ago

Is'nt the Docker volume mechanism a good solution for this? see https://docs.docker.com/storage/volumes/#share-data-among-machines.

I don't know if there is Google Cloud volume driver though, even if they are talkin about AWS and Azure in the doc.

jacobtomlinson commented 6 years ago

We currently use the same image for the notebooks and workers.

Allowing access to the users home directories from the workers is on our roadmap, however our current volume type can only be mounted on one host so we need to switch to a different type and migrate existing home directories first.

rabernat commented 6 years ago

We currently use the same image for the notebooks and workers.

Allowing access to the users home directories from the workers is on our roadmap, however our current volume type can only be mounted on one host so we need to switch to a different type and migrate existing home directories first.

I would like to move forward on this issue. @jacobtomlinson: can you share the Dockerfiles and related scripts which allow you to use a single image for both notebook and worker. Currently we have the following:

jacobtomlinson commented 6 years ago

We are currently using this image https://github.com/informatics-lab/singleuser-notebook for both.

It is a very bloated image (3GB compressed and 10GB unpacked) but that isn't because we are using it for both, it's because we have lots of extra stuff in it. A task on my todo list is to take the pangeo notebook image and add our extra stuff to it.

Looking at the two dockerfiles and preparation scripts I can't actually see that many differences. They are based on different images, but I imagine the notebook one is based on the miniconda one upstream so it probably just has some extra conda packages to add notebooks.

Each Dockerfile one has an apt-get bit, a conda bit and a pip bit to install packages, these could be merged.

The notebook one then has some extra steps for populating the home directory with the example notebooks. This could be made optional via an environment variable, e.g for the workers you set PANGEO_UPDATE_EXAMPLE_NOTEBOOKS=False in the dask-kubernetes worker template and add a check to the prepare script to skip over those steps if it is false.

mrocklin commented 6 years ago

I think that our notebook image is based off of jupyter's base-notebook image

On Thu, Jul 19, 2018 at 11:26 AM, Jacob Tomlinson notifications@github.com wrote:

We are currently using this image https://github.com/ informatics-lab/singleuser-notebook for both.

It is a very bloated image (4GB compressed and 10GB unpacked) but that isn't because we are using it for both, it's because we have lots of extra stuff in it. A task on my todo list is to take the pangeo notebook image and add our extra stuff to it.

Looking at the two dockerfiles and preparation scripts I can't actually see that many differences. They are based on different images, but I imagine the notebook one is based on the miniconda one upstream so it probably just has some extra conda packages to add notebooks.

Each Dockerfile one has an apt-get bit, a conda bit and a pip bit to install packages, these could be merged.

The notebook one then has some extra steps for populating the home directory with the example notebooks. This could be made optional via an environment variable, e.g for the workers you set PANGEO_UPDATEEXAMPLE NOTEBOOKS=False in the dask-kubernetes worker template and add a check to the prepare script to skip over those steps if it is false.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo/issues/272#issuecomment-406316516, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszPMtnwWw62YRg6AuK4Ow5KUMJCtkks5uIKUQgaJpZM4UOSYJ .

jacobtomlinson commented 6 years ago

Yeah looks like it. Ours is based of the scipy one, which I think is based off the base-notebook and adds scipy and some other packages.

I assumed that all the jupyter images eventually tree up to miniconda, but it turns out the are starting from ubuntu and installing miniconda (which is basically what the miniconda image does anyway).

https://github.com/jupyter/docker-stacks/blob/master/base-notebook/Dockerfile

Either way I would just add any packages to the notebook image which are in the worker image and then try using the notebook image in place of the worker image. I expect it should Just Work™.

jacobtomlinson commented 6 years ago

It will cause the example notebooks to be cloned onto all the workers at run time. But that can be removed later.

yuvipanda commented 6 years ago

There are two ways to do this in the long run:

  1. Use NFS for home directories. This might be something that PANGEO wants to do anyway, since it offers massive cost savings. Then mount your home directories in your worker pods. HPC style.
  2. Run a NFS Ganesha sidecar in each user pod. This allows you to share the homedir / other system contents with your workers.

I think (1) is the easier option. AWS has EFS, Google has Cloud Filestore.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

jhamman commented 5 years ago

For those that come along this issue later, we are currently using the same docker image for workers and jupyter pods BUT they are NOT on a shared file system. Instead, the notebook image just tells dask-kubernetes to use the same image. This solves the problems related to maintaining two docker images but does not solve the problem of user updates to the runtime environment.

drorspei commented 4 years ago

A year and a half into the future, how are guys doing it now? Do you have new insight into the runtime environment problem?

TomAugspurger commented 4 years ago

I think the user-environment issue is largely unchanged.

For binder, we have a pretty decent setup: There's a draft blogpost on this (background and our solution) at https://docs.google.com/document/d/14m-TNi2R4VaTI0g2vy15LBRGDkur2B21wiAdrTt6nBg/edit?usp=sharing that will be published on the pangeo blog once we find time to finish it off.

drorspei commented 4 years ago

That is really helpful, thanks