pangeo-data / pangeo-stacks

Curated Docker images for use with Jupyter and Pangeo
https://pangeo-data.github.io/pangeo-stacks/
BSD 3-Clause "New" or "Revised" License
17 stars 20 forks source link

Unable to modify dask-kubernetes configuration #104

Closed bgroenks96 closed 4 years ago

bgroenks96 commented 4 years ago

Derivative images seem to be unable to modify the dask-kubernetes configuration without rebuilding the base image from scratch.

This seems rather impractical, since it's only natural that one might want to modify the configuration of the dask workers. For example, we might want to taint the worker nodes so that core pods cannot be scheduled there (this is a problem for me at the moment). This means the dask pods need the corresponding toleration, which cannot presently be added without rebuilding base.

I wanted to go ahead and start a discussion on how to implement this.

TomAugspurger commented 4 years ago

cc @jacobtomlinson if you have any thoughts.

jhamman commented 4 years ago

@bgroenks96 - can you provide some more details on the workflow you have going right now? How are using the docker images provided here and at what point would you like to update the dask-kubernetes configuration?

bgroenks96 commented 4 years ago

I'm using the onbuild docker images. Ideally, I would like to be able to modify the dask configuration at the point where I build my derivative image from onbuild, similar to how we are able to modify the conda and pip environments.

jhamman commented 4 years ago

You should be able to use the postBuild file to update/overwrite the default dask-kubernetes config. Have you tried this with the onbuild image?

bgroenks96 commented 4 years ago

Not yet, no. But it's not clear to me where exactly the dask-kubernetes config lives (I don't know what KERNEL_PYTHON_PREFIX is). It also seems a bit convoluted to have to write a script to read/modify/replace this file, so I thought perhaps we could make it a more "official" configuration option.

One possibility would be checking for a dask-config.yaml file in the child image in r2doverlay like what's done for conda and pip, and then doing a YAML merge with the default, with user config given preference in conflicts.

jhamman commented 4 years ago

@bgroenks96 - I think I see what you are going for. It might be worth reading up (if you haven't already) on 1) the dask configuration system (https://docs.dask.org/en/latest/configuration.html) and 2) the repo2docker configuration system (https://repo2docker.readthedocs.io/en/latest/config_files.html). r2doverlay implements a subset of the repo2docker functionality. You'll notice that there isn't an option in repo2docker for dask configuration files so we use the postBuild utility instead. I think this is still your best bet. You may not need to merge the files though, you can just overwrite the existing one with your own "opinionated" configuration.

BTW, ${KERNEL_PYTHON_PREFIX} is set by repo2docker to sys.prefix. So putting dask configs there get picked up automatically (see dask config docs).

bgroenks96 commented 4 years ago

Is KERNEL_PYTHON_PREFIX available in the child image docker build?

jhamman commented 4 years ago

Yes, it should be.

bgroenks96 commented 4 years ago

So the idea then would be to add a postBuild script that copies a local dask config file to the same location from postBuild in the base notebook?

I suppose that should work, provided that the child postBuild runs after the base notebook.

jhamman commented 4 years ago

That's right! This is a fairly well established pattern so I'd be surprised if this didn't work. Let us know how it goes.

bgroenks96 commented 4 years ago

It worked! I'll go ahead and close this issue.