pangeo-data / helm-chart

Pangeo helm charts
https://pangeo-data.github.io/helm-chart/
21 stars 26 forks source link

Use the same image for notebook and worker #53

Closed jacobtomlinson closed 5 years ago

jacobtomlinson commented 6 years ago

I want to suggest that we remove the worker image and use the notebook image for both purposes.

This is what we are doing on our Pangeo. I actually spent yesterday rewriting our image to use the notebook image from this repo as a base so it will be easier for me to push changes upstream. Both the notebooks and the workers use this image. Although the workers end up using a larger image the startup time isn't affected and we reduce disk usage by only using one image.

mrocklin commented 6 years ago

Fine with me

On Fri, Jul 27, 2018 at 12:52 AM, Jacob Tomlinson notifications@github.com wrote:

I want to suggest that we remove the worker image and use the notebook image for both purposes.

This is what we are doing on our Pangeo. I actually spent yesterday rewriting our image to use the notebook image from this repo as a base so it will be easier for me to push changes upstream. Both the notebooks and the workers use this image. Although the workers end up using a larger image the startup time isn't affected and we reduce disk usage by only using one image.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/issues/53, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDDel-__yK2htB8fpaeA7UyZ0-e_ks5uKsbEgaJpZM4VjEWH .

rabernat commented 6 years ago

I just tried this on the latest helm chart (#57: 0.1.1-92c4bfd). On a test cluster, I manually changed the name of the docker image in worker-template.yaml to the notebook docker image (pangeo/notebook:92c4bfd). Everything worked fine.

How can we automatically populate this field so that is the default? I think that @jhamman has figured this out for his binder stuff.

Then we can drop the worker image completely.

jhamman commented 6 years ago

We're headed to a place where we can set the name of the image in the worker template to: ${JUPYTER_IMAGE_SPEC} and then let dask expand that variable. So the workflow is this:

  1. kubespawner sets JUPYTER_IMAGE_SPEC and exports that variable into the user space (https://github.com/jupyterhub/kubespawner/pull/193)
  2. dask reads its config file and expands JUPYTER_IMAGE_SPEC on the fly (https://github.com/dask/dask-kubernetes/issues/92)
  3. everything works perfectly

In binder.pangeo.io, I am doing some fancy sed tricks to set the image name in the dask config until we workout a more elegant solution. In the short term, we could do this in the single user start script.

Also, I'd like to suggest we move the contents of worker-template.yaml to a dask config file. I'm finding this is convenient for a number of reasons, including simplifying the config process and cleaning up the user home directory.

rabernat commented 6 years ago

I would like to make an upgrade to pangeo.pydata.org soon. Is it possible for us to use the hackish solution that @jhamman has developed for binder as an interim fix?

jhamman commented 6 years ago

@rabernat - I think you would need to add my little sed thing to the prepare.sh entrypoint in the notebook image.

guillaumeeb commented 5 years ago

This would proves useful not having to specify the Docker image version by hand. @jhamman do you still use the sed thing?

jhamman commented 5 years ago

No. Now we're using the environment variable ${JUPYTER_IMAGE_SPEC} which is set by kubespawner and is expanded by dask-kubernetes.

guillaumeeb commented 5 years ago

So we could update the chart here to use this also? (Sorry I don't yet totally grasp it).

jhamman commented 5 years ago

Yes, we could remove the worker image entirely and then add the following line to docker-images/notebook/config.yaml:

https://github.com/pangeo-data/pangeo-tutorial-agu-2018/blob/ec20e3ddbc77d7a355ced8268d7e7ccf73cc694c/.dask/config.yaml#L31