pangeo-data / pangeo-cloud-federation

Deployment automation for Pangeo JupyterHubs on AWS, Google, and Azure
https://pangeo.io/cloud.html
59 stars 32 forks source link

user conda environments #254

Open scottyhq opened 5 years ago

scottyhq commented 5 years ago

We've now set-up staging.nasa.pangeo.io to allow users to create their own conda environments (see https://github.com/pangeo-data/pangeo-cloud-federation/blob/staging/deployments/nasa/config/common.yaml#L34).

I'm running into "The environment is inconsistent" and hanging "solving environment" issues with conda currently though in our image. I noticed that /srv/conda/.condarc has the following config:

channels:
  - conda-forge
  - defaults
auto_update_conda: false
show_channel_urls: true
update_dependencies: false

I'm wondering about the update_dependencies: false causing trouble. It comes from repo2docker (https://github.com/jupyter/repo2docker/blob/9099def40a331df04ba3ed862ee27a8e4a77fe43/repo2docker/buildpacks/conda/install-miniconda.bash#L39).

I also noticed we end up with a mix of packages from conda-forge, defaults, and pypi currently, which I guess is originating from pangeo-stacks: https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/environment.yml

So... @yuvipanda , @jhamman 1) Why is update_dependencies: false? 2) Should we change pangeo-stacks to just use conda-forge?

yuvipanda commented 5 years ago

@minrk might know about the condo stuff on repo2docker

rsignell-usgs commented 5 years ago

@ocefpaf, any insight here?

ocefpaf commented 5 years ago

I'm running into "The environment is inconsistent" and hanging "solving environment" issues with conda currently though in our image. I noticed that /srv/conda/.condarc has the following config:

Can you try to update the conda in your env and activating the strict channel options?

channel_priority: strict

That should help with the "hanging" and inconsistent environments. If you find any errors with that it will be easier to debug too.

PS: please check this gist for more on the .condarc options.

ocefpaf commented 5 years ago

We've now set-up staging.nasa.pangeo.io to allow users to create their own conda environments

BTW the options,

auto_update_conda: false
update_dependencies: false

are very useful for managing Dockerfiles but quite bad for local users. If you want users to manage their own envs I would remove that options. A Dockerfile adim would know when to update those by the average users should try to have the latest as much as possible. Conda is still evolving and the options that make it faster and more consistent are are being update/added constantly. The strict channel option, for example, will be default in conda 4.7.

scottyhq commented 5 years ago

Thanks @ocefpaf ! This is great information.

After adding your suggestions things proceed but definitely still feel slow. In case we want to modify our base environment coming from Docker, I'm copying the results of a conda update --all with the strict channel requirement below. Also see the bottom for packages currently installed via pypi (includes dask so @jhamman - should we try to modify pangeo-stacks?)

(base) jovyan@jupyter-scottyhq:~$ conda update --all
Collecting package metadata: done
Solving environment: |
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/linux-64::ipykernel==5.1.0=py36h24bf2e0_1002
  - conda-forge/label/broken/noarch::jupyter_client==5.2.4=py_1
  - conda-forge/noarch::jupyterlab_launcher==0.13.1=py_2
  - conda-forge/label/broken/linux-64::notebook==5.7.4=py36_1000
  - conda-forge/linux-64::awscli==1.16.149=py36_0
  - conda-forge/noarch::boto3==1.9.139=py_0
  - conda-forge/noarch::botocore==1.12.139=py_0
  - conda-forge/noarch::dask==1.1.1=py_0
  - conda-forge/noarch::datacube==1.6.2=py_1
  - conda-forge/noarch::datashader==0.6.9=py_0
  - conda-forge/noarch::geocube==0.0.2=py_0
  - conda-forge/noarch::intake==0.4.4=py_0
  - conda-forge/noarch::intake-xarray==0.3.0=py_0
  - conda-forge/linux-64::matplotlib==3.0.3=py36_1
  - conda-forge/linux-64::matplotlib-base==3.0.3=py36h5f35d83_1
  - conda-forge/linux-64::nb_conda_kernels==2.2.1=py36_0
  - conda-forge/noarch::regionmask==0.4.0=py_0
  - conda-forge/noarch::rioxarray==0.0.3=py_0
  - conda-forge/noarch::xarray==0.12.1=py_0
  - conda-forge/noarch::alembic==1.0.8=py_0
  - conda-forge/linux-64::bokeh==1.0.4=py36_1000
  - conda-forge/linux-64::cartopy==0.17.0=py36h0aa2c8f_1004
  - conda-forge/linux-64::climlab==0.7.3=py36h4c70da7_0
  - conda-forge/noarch::dask-glm==0.1.0=0
  - conda-forge/noarch::dask-jobqueue==0.4.1=py_0
  - conda-forge/noarch::dask-kubernetes==0.7.0=py_0
  - conda-forge/noarch::dask-ml==0.12.0=py_0
  - conda-forge/noarch::datashape==0.5.4=py_1
  - conda-forge/noarch::descartes==1.1.0=py_3
  - conda-forge/noarch::geopandas==0.4.1=py_1
  - conda-forge/noarch::geoviews==1.6.2=py_0
  - conda-forge/noarch::geoviews-core==1.6.2=py_0
  - conda-forge/noarch::holoviews==1.11.3=py_0
  - conda-forge/noarch::hvplot==0.4.0=py_1
  - conda-forge/noarch::intake-esm==2019.2.28=py_1
  - conda-forge/linux-64::ipyleaflet==0.10.1=py36_0
  - conda-forge/noarch::ipywidgets==7.4.2=py_0
  - conda-forge/linux-64::iris==2.2.0=py36_1003
  - conda-forge/linux-64::jupyterhub==0.9.6=py36_0
  - conda-forge/noarch::mapclassify==2.0.1=py_0
  - conda-forge/linux-64::metpy==0.10.0=py36_1001
  - conda-forge/noarch::nbserverproxy==0.8.8=py_1000
  - conda-forge/noarch::owslib==0.17.1=py_0
  - conda-forge/linux-64::pandas==0.24.2=py36hf484d3e_0
  - conda-forge/noarch::panel==0.4.0=1
  - conda-forge/noarch::pyspectral==0.8.7=py_0
  - conda-forge/linux-64::python-geotiepoints==1.1.7=py36h3010b51_0
  - conda-forge/linux-64::python-kubernetes==4.0.0=py36_1
  - conda-forge/noarch::s3fs==0.2.1=py_0
  - conda-forge/linux-64::s3transfer==0.2.0=py36_0
  - conda-forge/noarch::satpy==0.14.1=pyh326bf55_0
  - conda-forge/linux-64::scikit-image==0.14.2=py36hf484d3e_1
  - conda-forge/noarch::trollimage==1.7.0=py_0
  - conda-forge/linux-64::widgetsnbextension==3.4.2=py36_1000
  - conda-forge/linux-64::xesmf==0.1.1=py36_1
  - conda-forge/noarch::xgcm==0.2.0=py_0
  - conda-forge/noarch::xrft==0.2.0=py_0
  - conda-forge/noarch::jupyter==1.0.0=py_2
  - conda-forge/noarch::jupyter_console==6.0.0=py_0
  - conda-forge/linux-64::jupyterlab==0.35.4=py36_0
  - conda-forge/noarch::jupyterlab_server==0.2.0=py_0
  - conda-forge/noarch::qtconsole==4.4.3=py_0
done
Full Output

``` ## Package Plan ## environment location: /srv/conda The following packages will be downloaded: package | build ---------------------------|----------------- atk-2.25.90 | hb9dd440_1002 430 KB conda-forge attrs-19.1.0 | py_0 32 KB conda-forge binutils_impl_linux-64-2.31.1| h6176602_1 16.5 MB defaults binutils_linux-64-2.31.1 | h6176602_3 9 KB defaults colorama-0.4.1 | py_0 15 KB conda-forge conda-env-2.6.0 | 1 2 KB conda-forge cryptography-2.5 | py36hb7f436b_1 645 KB conda-forge curl-7.64.0 | h646f8bb_0 143 KB conda-forge dask-core-1.2.1 | py_0 534 KB conda-forge dbus-1.13.6 | he372182_0 602 KB conda-forge fiona-1.8.6 | py36hf242f0b_0 1.1 MB conda-forge gcc_impl_linux-64-7.3.0 | habb00fd_1 73.2 MB conda-forge gcc_linux-64-7.3.0 | h553295d_3 10 KB conda-forge gdal-2.4.0 |py36h1c6dbfb_1002 1.3 MB conda-forge gdk-pixbuf-2.36.12 | h49783d7_1002 592 KB conda-forge giflib-5.1.9 | h516909a_0 108 KB conda-forge gobject-introspection-1.58.2|py36h2da5eee_1000 1.2 MB conda-forge graphviz-2.40.1 | h0dab3d1_0 6.8 MB conda-forge grpcio-1.16.0 |py36h4f00d22_1000 1.0 MB conda-forge gsw-3.3.1 | py36h516909a_0 1.8 MB conda-forge gtk2-2.24.31 | hb68c50a_1001 7.3 MB conda-forge gxx_impl_linux-64-7.3.0 | hdf63c60_1 18.7 MB conda-forge gxx_linux-64-7.3.0 | h553295d_3 9 KB conda-forge jsonschema-3.0.1 | py36_0 84 KB conda-forge keras-2.1.6 | py36_0 500 KB conda-forge keras-applications-1.0.7 | py_0 30 KB conda-forge keras-preprocessing-1.0.9 | py_0 32 KB conda-forge kiwisolver-1.1.0 | py36hc9558a2_0 86 KB conda-forge krb5-1.16.3 | hc83ff2d_1000 1.4 MB conda-forge libcurl-7.64.0 | h01ee5af_0 586 KB conda-forge libedit-3.1.20170329 | hf8c457e_1001 172 KB conda-forge libgdal-2.4.0 | h982c1cc_1002 18.5 MB conda-forge libpq-10.6 | h13b8bad_1000 2.5 MB conda-forge libssh2-1.8.0 | h1ad7b7a_1003 246 KB conda-forge libxml2-2.9.9 | h13577e0_0 2.0 MB conda-forge lz4-2.1.6 |py36hd79334b_1001 37 KB conda-forge lz4-c-1.8.3 | he1b5a44_1001 187 KB conda-forge mock-2.0.0 | py36_1001 106 KB conda-forge nbconvert-5.5.0 | py_0 375 KB conda-forge netcdf4-1.5.1 | py36had58050_0 535 KB conda-forge nodejs-11.14.0 | he1b5a44_0 16.6 MB conda-forge openssl-1.0.2r | h14c3975_0 3.1 MB conda-forge pandoc-2.7.2 | 0 21.7 MB conda-forge pbr-5.1.3 | py_0 70 KB conda-forge pcre-8.41 | hf484d3e_1003 249 KB conda-forge pooch-0.3.1 | py36_0 26 KB conda-forge postgresql-10.6 | h66cca7a_1000 4.7 MB conda-forge prometheus_client-0.6.0 | py_0 34 KB conda-forge psycopg2-2.7.7 | py36hb7f436b_0 305 KB conda-forge pycurl-7.43.0.2 | py36hb7f436b_0 60 KB defaults pyqt-5.6.0 |py36h13b7fb3_1008 5.4 MB conda-forge pyrsistent-0.15.1 | py36h516909a_0 88 KB conda-forge python-3.6.7 | hd21baee_1002 34.6 MB conda-forge qt-5.6.2 | hce4f676_1013 44.6 MB conda-forge rasterio-1.0.22 | py36h5b3f9e8_0 8.2 MB conda-forge shapely-1.6.4 |py36h2afed24_1004 330 KB conda-forge sip-4.18.1 |py36hf484d3e_1000 277 KB conda-forge tensorboard-1.13.1 | py36_0 3.3 MB conda-forge tensorflow-1.13.1 | py36_0 77.2 MB conda-forge tensorflow-estimator-1.13.0| py_0 205 KB defaults terminado-0.8.2 | py36_0 23 KB conda-forge testpath-0.4.2 | py_1001 85 KB conda-forge theano-1.0.4 |py36hf484d3e_1000 3.6 MB conda-forge websocket-client-0.56.0 | py36_0 58 KB conda-forge zarr-2.3.1 | py36_0 223 KB conda-forge ------------------------------------------------------------ Total: 384.2 MB The following NEW packages will be INSTALLED: atk conda-forge/linux-64::atk-2.25.90-hb9dd440_1002 binutils_impl_lin~ pkgs/main/linux-64::binutils_impl_linux-64-2.31.1-h6176602_1 binutils_linux-64 pkgs/main/linux-64::binutils_linux-64-2.31.1-h6176602_3 gcc_impl_linux-64 conda-forge/linux-64::gcc_impl_linux-64-7.3.0-habb00fd_1 gcc_linux-64 conda-forge/linux-64::gcc_linux-64-7.3.0-h553295d_3 gdk-pixbuf conda-forge/linux-64::gdk-pixbuf-2.36.12-h49783d7_1002 gobject-introspec~ conda-forge/linux-64::gobject-introspection-1.58.2-py36h2da5eee_1000 gtk2 conda-forge/linux-64::gtk2-2.24.31-hb68c50a_1001 gxx_impl_linux-64 conda-forge/linux-64::gxx_impl_linux-64-7.3.0-hdf63c60_1 gxx_linux-64 conda-forge/linux-64::gxx_linux-64-7.3.0-h553295d_3 mock conda-forge/linux-64::mock-2.0.0-py36_1001 pbr conda-forge/noarch::pbr-5.1.3-py_0 tensorflow-estima~ pkgs/main/noarch::tensorflow-estimator-1.13.0-py_0 zstd conda-forge/linux-64::zstd-1.3.3-1 The following packages will be UPDATED: attrs 18.2.0-py_0 --> 19.1.0-py_0 blas 2.5-openblas --> 2.8-openblas cffi 1.12.2-py36hf0e25f4_1 --> 1.12.3-py36h8022711_0 colorama 0.3.9-py_1 --> 0.4.1-py_0 dask-core 1.1.1-py_0 --> 1.2.1-py_0 dbus 1.13.0-h4e0c4b3_1000 --> 1.13.6-he372182_0 decorator 4.3.2-py_0 --> 4.4.0-py_0 distributed 1.25.3-py36_0 --> 1.27.1-py36_0 giflib 5.1.7-h516909a_1 --> 5.1.9-h516909a_0 graphviz 2.38.0-hf68f40c_1011 --> 2.40.1-h0dab3d1_0 gsw 3.3.0-py36h14c3975_0 --> 3.3.1-py36h516909a_0 ipython 7.1.1-py36h24bf2e0_1000 --> 7.5.0-py36h24bf2e0_0 jedi 0.13.2-py36_1000 --> 0.13.3-py36_0 jinja2 2.10-py_1 --> 2.10.1-py_0 jsonschema 3.0.0a3-py36_1000 --> 3.0.1-py36_0 keras-applications 1.0.4-py_1 --> 1.0.7-py_0 keras-preprocessi~ 1.0.2-py_1 --> 1.0.9-py_0 kiwisolver 1.0.1-py36h6bb024c_1002 --> 1.1.0-py36hc9558a2_0 libblas 3.8.0-5_openblas --> 3.8.0-8_openblas libcblas 3.8.0-5_openblas --> 3.8.0-8_openblas libedit pkgs/main::libedit-3.1.20170329-h6b74~ --> conda-forge::libedit-3.1.20170329-hf8c457e_1001 libffi 3.2.1-hf484d3e_1005 --> 3.2.1-he1b5a44_1006 libgcc-ng 7.3.0-hdf63c60_0 --> 8.2.0-hdf63c60_1 liblapack 3.8.0-5_openblas --> 3.8.0-8_openblas liblapacke 3.8.0-5_openblas --> 3.8.0-8_openblas libstdcxx-ng 7.3.0-hdf63c60_0 --> 8.2.0-hdf63c60_1 libtiff 4.0.10-h648cc4a_1001 --> 4.0.10-h9022e91_1002 libxml2 2.9.8-h143f9aa_1005 --> 2.9.9-h13577e0_0 lz4 2.1.6-py36ha8eefa0_1000 --> 2.1.6-py36hd79334b_1001 lz4-c 1.8.1.2-0 --> 1.8.3-he1b5a44_1001 markupsafe 1.1.0-py36h14c3975_1000 --> 1.1.1-py36h14c3975_0 nbconvert conda-forge/label/broken::nbconvert-5~ --> conda-forge::nbconvert-5.5.0-py_0 netcdf4 1.5.0.1-py36had58050_0 --> 1.5.1-py36had58050_0 nodejs 11.11.0-hf484d3e_0 --> 11.14.0-he1b5a44_0 numpy 1.16.2-py36h8b7e671_1 --> 1.16.3-py36he5ce36f_0 openblas 0.3.5-h9ac9557_1001 --> 0.3.6-h6e990d7_1 pandoc 1.19.2-0 --> 2.7.2-0 parso 0.3.3-py_0 --> 0.4.0-py_0 pexpect 4.6.0-py36_1000 --> 4.7.0-py36_0 pooch 0.2.1-py36_1000 --> 0.3.1-py36_0 prometheus_client 0.5.0-py_0 --> 0.6.0-py_0 prompt_toolkit 2.0.8-py_0 --> 2.0.9-py_0 psutil 5.6.1-py36h14c3975_0 --> 5.6.2-py36h516909a_0 ptyprocess conda-forge/linux-64::ptyprocess-0.6.~ --> conda-forge/noarch::ptyprocess-0.6.0-py_1001 pyqt 4.11.4-py36_3 --> 5.6.0-py36h13b7fb3_1008 pyrsistent 0.14.10-py36h14c3975_0 --> 0.15.1-py36h516909a_0 pyyaml 3.13-py36h14c3975_1001 --> 5.1-py36h14c3975_0 pyzmq 17.1.2-py36h6afc9c9_1001 --> 18.0.1-py36hc4ba49a_1 qt pkgs/free::qt-4.8.7-2 --> conda-forge::qt-5.6.2-hce4f676_1013 setuptools 40.8.0-py36_0 --> 41.0.1-py36_0 shapely 1.6.4-py36h2afed24_1003 --> 1.6.4-py36h2afed24_1004 sip 4.18-py36_1 --> 4.18.1-py36hf484d3e_1000 sqlite 3.26.0-h67949de_1000 --> 3.26.0-h67949de_1001 tensorboard 1.10.0-py36_0 --> 1.13.1-py36_0 tensorflow 1.10.0-py36_0 --> 1.13.1-py36_0 terminado 0.8.1-py36_1001 --> 0.8.2-py36_0 testpath conda-forge/linux-64::testpath-0.3.1-~ --> conda-forge/noarch::testpath-0.4.2-py_1001 theano 1.0.3-py36_0 --> 1.0.4-py36hf484d3e_1000 tk 8.6.9-h84994c4_1000 --> 8.6.9-h84994c4_1001 tornado 5.1.1-py36h14c3975_1000 --> 6.0.2-py36h516909a_0 urllib3 1.24.1-py36_1000 --> 1.24.2-py36_0 websocket-client 0.40.0-py36_0 --> 0.56.0-py36_0 wheel 0.32.3-py36_0 --> 0.33.1-py36_0 yaml pkgs/main::yaml-0.1.7-had09818_2 --> conda-forge::yaml-0.1.7-h14c3975_1001 zarr conda-forge/noarch::zarr-2.2.0-py_1 --> conda-forge/linux-64::zarr-2.3.1-py36_0 zeromq 4.2.5-hf484d3e_1006 --> 4.3.1-hf484d3e_1000 The following packages will be SUPERSEDED by a higher-priority channel: conda-env pkgs/main/linux-64 --> conda-forge/noarch grpcio pkgs/main::grpcio-1.16.1-py36hf8bcb03~ --> conda-forge::grpcio-1.16.0-py36h4f00d22_1000 pcre pkgs/main::pcre-8.43-he6710b0_0 --> conda-forge::pcre-8.41-hf484d3e_1003 The following packages will be DOWNGRADED: cryptography 2.6.1-py36h72c5cf5_0 --> 2.5-py36hb7f436b_1 curl 7.64.1-hf8cf82a_0 --> 7.64.0-h646f8bb_0 fiona 1.8.6-py36hf242f0b_3 --> 1.8.6-py36hf242f0b_0 gdal 2.4.1-py36hf242f0b_0 --> 2.4.0-py36h1c6dbfb_1002 keras 2.2.4-py36_0 --> 2.1.6-py36_0 krb5 1.16.3-h05b26f9_1001 --> 1.16.3-hc83ff2d_1000 libcurl 7.64.1-hda55be3_0 --> 7.64.0-h01ee5af_0 libgdal 2.4.1-hdb8f723_0 --> 2.4.0-h982c1cc_1002 libpq 11.2-h4770945_0 --> 10.6-h13b8bad_1000 libssh2 1.8.2-h22169c7_2 --> 1.8.0-h1ad7b7a_1003 openssl 1.1.1b-h14c3975_1 --> 1.0.2r-h14c3975_0 postgresql 11.2-h61314c7_0 --> 10.6-h66cca7a_1000 psycopg2 2.8.2-py36h72c5cf5_0 --> 2.7.7-py36hb7f436b_0 pycurl 7.43.0.2-py36h1ba5d50_0 --> 7.43.0.2-py36hb7f436b_0 python 3.6.7-h381d211_1004 --> 3.6.7-hd21baee_1002 rasterio 1.0.22-py36h5b3f9e8_1 --> 1.0.22-py36h5b3f9e8_0 (base) jovyan@jupyter-scottyhq:~$ conda list | grep pypi alembic 1.0.7 pypi_0 pypi bokeh 1.1.0 pypi_0 pypi cachetools 3.1.0 pypi_0 pypi click 7.0 pypi_0 pypi dask 1.2.0 pypi_0 pypi dask-labextension 0.3.1 pypi_0 pypi distributed 1.27.0 pypi_0 pypi heapdict 1.0.0 pypi_0 pypi intake-stac 0+untagged.28.g661390e pypi_0 pypi jupyterhub 0.9.4 pypi_0 pypi kubernetes 9.0.0 pypi_0 pypi mako 1.0.7 pypi_0 pypi mercantile 1.0.4 pypi_0 pypi msgpack 0.6.1 pypi_0 pypi msgpack-python 0.5.6 pypi_0 pypi nteract-on-jupyter 2.0.0 pypi_0 pypi pillow 6.0.0 pypi_0 pypi pyasn1 0.4.5 pypi_0 pypi pyasn1-modules 0.2.4 pypi_0 pypi python-dateutil 2.7.5 pypi_0 pypi python-editor 1.0.4 pypi_0 pypi python-oauth2 1.1.0 pypi_0 pypi pyyaml 5.1 pypi_0 pypi rio-cogeo 1.0.0 pypi_0 pypi rsa 4.0 pypi_0 pypi sat-search 0.2.0 pypi_0 pypi sat-stac 0.1.2 pypi_0 pypi sqlalchemy 1.2.17 pypi_0 pypi supermercado 0.0.5 pypi_0 pypi tblib 1.3.2 pypi_0 pypi toolz 0.9.0 pypi_0 pypi websocket-client 0.56.0 pypi_0 pypi zict 0.1.4 pypi_0 pypi ```

ocefpaf commented 5 years ago

After adding your suggestions things proceed but definitely still feel slow. In case we want to modify our base environment coming from Docker, I'm copying the results of a conda update --all with the strict channel requirement below

conda is getting faster but some big envs are still slow to solve. Specially when trying to update on top of an existing env. I usually recommend to never update. Just remove the old env and re-create it. (At least until conda's solver gets better.)

Also see the bottom for packages currently installed via pypi (includes dask so @jhamman - should we try to modify pangeo-stacks?)

I'm not familiar with the current pangeo dependencies but I know that in the past they relied on development version of some packages. Maybe those are a "pip install from master"? If that is not the case I would love to work with the pangeo group to get all the packages you need in conda-forge.

scottyhq commented 5 years ago

thanks again. pinging @rabernat and adding a couple links to related issues: conda-forge packages: https://github.com/pangeo-data/pangeo-stacks/issues/23 image size: https://github.com/pangeo-data/pangeo-stacks/issues/22

scottyhq commented 5 years ago

Also, one more comment confirming that b/c we've set dask workers to have the same home directory you can launch a KubeCluster that matches a user-created conda environment with the following:

Not sure how to get this incorporated w/ the dask jupyterlab extension (@ian-r-rose, @mrocklin )

from dask_kubernetes import KubeCluster
from dask.distributed import Client
cluster = KubeCluster(env={'PATH': '/home/jovyan/my-conda-envs/dask-minimal/bin:$PATH'})
cluster.scale(2)
client=Client(cluster)

check worker environments with

client.get_versions(check=True)
client.run(lambda: os.environ)
scottyhq commented 5 years ago

just documenting that for persistent user-defined conda environments we currently must place this config file in /home/jovyan/.condarc:

# Override Dockerfile conda settings
channel_priority: strict
channels:
  - conda-forge
  - defaults
auto_update_conda: true
show_channel_urls: true
update_dependencies: true
auto_activate_base: false
envs_dirs:
  - /home/jovyan/my-conda-envs/
create_default_packages:
  - ipykernel
  - blas=*=openblas

And dask_config.yaml is currently: https://github.com/pangeo-data/pangeo-cloud-federation/blob/staging/deployments/nasa/image/binder/dask_config.yaml

distributed:
  logging:
    bokeh: critical

  dashboard:
    link: /user/{JUPYTERHUB_USER}/proxy/{port}/status

  admin:
    tick:
      limit: 5s

kubernetes:
  name: dask-{JUPYTERHUB_USER}-{uuid}
  worker-template:
    spec:
      nodeSelector:
        alpha.eksctl.io/nodegroup-name: dask-worker
      restartPolicy: Never
      containers:
        - name: dask-${JUPYTERHUB_USER}
          image: ${JUPYTER_IMAGE_SPEC}
          args:
            - dask-worker
            - --local-directory
            - /home/jovyan/dask-worker-space
            - --nthreads
            - '2'
            - --no-bokeh
            - --memory-limit
            - 7GB
            - --death-timeout
            - '60'
          resources:
            limits:
              cpu: "1.75"
              memory: 7G
            requests:
              cpu: 1
              memory: 7G
          volumeMounts:
            - name: nfs
              mountPath: /home/jovyan
              subPath: "nasa.pangeo.io/home/${JUPYTERHUB_USER}"
      volumes:
        - name: nfs
          persistentVolumeClaim:
            claimName: home-nfs

labextension:
  factory:
    module: dask_kubernetes
    class: KubeCluster
    args: []
    kwargs: {}
scottyhq commented 5 years ago

And another update... Since this change to repo2docker https://github.com/jupyter/repo2docker/pull/651 , containers seem to have a new entrypoint script : https://github.com/jupyter/repo2docker/blob/80b979f8580ddef184d2ba7d354e7a833cfa38a4/repo2docker/buildpacks/conda/activate-conda.sh

So to share the currently active conda environment in a notebook with dask workers, launch a cluster with

cluster = KubeCluster(env={'NB_PYTHON_PREFIX':sys.prefix})

Not sure this is the 'best' way to use different conda environments among workers with dask-kubernetes. In particular, this setup means each worker is accessing the same python files under /home/jovyan/myenv via NFS instead of making a local copy of the conda environment. Thoughts @mrocklin and @TomAugspurger?

TomAugspurger commented 5 years ago

Looking into this now.

TomAugspurger commented 5 years ago

@scottyhq can you test if including

kubernetes:
  env:
    NB_PYTHON_PREFIX: $NB_PYTHON_PREFIX

works / breaks anything? You can remove the env={'NB_PYTHON_PREFIX':sys.prefix}. It looks like dask_kubernetes will expand environment variables (which will I think evaluate to the correct thing in the notebook) before passing though.