pangeo-data / pangeo-cloud-federation

Deployment automation for Pangeo JupyterHubs on AWS, Google, and Azure
https://pangeo.io/cloud.html
58 stars 32 forks source link

Handling dask_config.yaml #805

Closed tjcrone closed 4 years ago

tjcrone commented 4 years ago

For a while now it looks like the postBuild script we are using does not run as root, which means that it cannot install the Dask configuration file in /etc/dask in our notebook image. Not sure if I'm remembering correctly but a while back I thought that Dask would also look in the Conda env etc for the configuration file, but this is not indicated in the documentation. How are other deployments handling sytem-wide Dask configuration for users at this stage? Seems like something that would not be good to burn into the notebook image but maybe it is. Any thinkings on this would be greatly appreciated as I try to solve my postBuild issue. Thanks!

TomAugspurger commented 4 years ago

Dask looks on dask.config.paths for all configuration files. On us-central1b.gcp.pangeo.io, that's:

>>> dask.config.paths
['/srv/conda/etc',
 '/srv/conda/envs/notebook/etc/dask',
 '/home/jovyan/.config/dask',
 '/home/jovyan/.dask']

So it is searching on the env of the conda path, which is '/srv/conda/envs/notebook/etc/dask',.

IIUC, you're saying that the cp fails because the directory is owned by root, but postBuild is run as jovyan? That seems like something we could fix in the base docker image (or perhaps have root version of postBuild).

tjcrone commented 4 years ago

Okay thank you for this information. I feel like the easiest way forward here is to copy the config file into /srv/conda/envs/notebook/etc/dask/, which I believe jovyan has rw access to. Sound reasonable?

TomAugspurger commented 4 years ago

I didn't realize that that directory was owned by jovyan. That looks like what https://github.com/pangeo-data/pangeo-cloud-federation/blob/6b898735fdd8e1bc99951735aa61f6fede4b09e6/deployments/ooi/image/binder/postBuild. I wonder if the KERNEL_PYTHON_PREFIX environment variable just isn't defined in the docker image?

scottyhq commented 4 years ago

Happy to remove dask_config.yaml entirely from the base docker image https://github.com/pangeo-data/pangeo-docker-images/issues/94. i think previously it was necessary to ensure dask-kubernetes worked. Now if I'm not mistaken the relevant pieces are set by dask-gateway config.

i don't think postBuild ever ran as root. Note that we currently set NB_PYTHON_PREFIX and DASK_ROOT_CONFIG in the image https://github.com/pangeo-data/pangeo-docker-images/blob/master/base-image/Dockerfile

Many other environment variables are set by jupyterhub/kubespawner when you start your server.

scottyhq commented 4 years ago

@TomAugspurger the one piece we're still leaning on in the image dask_config.yaml is integration with labextension correct?

labextension:
  factory:
    module: dask_gateway
    class: GatewayCluster
    args: []
    kwargs: {}

Otherwise, users by default launch a LocalCluster. Personally, I never use the '+' button and prefer explicitly creating the cluster in a code cell and then just copying the dashboard URL over...

TomAugspurger commented 4 years ago

Perhaps the daskboard link too?

  dashboard:
    link: /user/{JUPYTERHUB_USER}/proxy/{port}/status

But that can be set as an environment variable too.

tjcrone commented 4 years ago

Okay so where in the dask-gateway config do the resource limits and requests go? And volume mounts? I think that is the only thing that we had in our dask_config.yml that we want to keep. Are these no longer settable by users?

tjcrone commented 4 years ago

Okay I think I found some of this here: https://github.com/pangeo-data/pangeo-cloud-federation/blob/77a7a743bb0cbd9ae43846e17a44e881236b476b/deployments/icesat2/config/common.yaml#L130.

What about NFS home directories? We were able to provide access to the user's home directory using NSF with our previous configuration. Is that something that we have solved yet? Thanks.

scottyhq commented 4 years ago

What about NFS home directories? We were able to provide access to the user's home directory using NSF with our previous configuration. Is that something that we have solved yet? Thanks.

Not sure how to set that up with dask-gateway. I previously mounted the home directory on all workers on dask-kubernetes. But at some point decided to put things on S3 that needed to be globally accessed by dask workers. But in case it's helpful this is a link to the volume mounting piece https://github.com/pangeo-data/pangeo-cloud-federation/blob/8cd02088ed2689934e92689e448e40723f6fe381/deployments/icesat2/image/binder/dask_config.yaml

tjcrone commented 4 years ago

Okay aside from the NFS question, and related issues and a related discussion here: https://github.com/pangeo-data/pangeo-cloud-federation/issues/824, I think this is solved so closing. If I cannot solve the NFS issue I will start a separate thread on that. Thanks for everyone's help!