Open TomAugspurger opened 4 years ago
Here's my (deliberately too small to provoke discussion) proposed list
# dask, jupyterlab
- pangeo-notebook
# core scipy packages
- numpy
- scipy
- matplotlib-base
- pandas
- xarray
- sparse
- sympy
# intake-related
- intake
- intake-xarray
- intake-esm
- fsspec
- intake-stac
# zarr-related
- zarr
- gcsfs
- s3fs
Notably absent are
I think some of those should be included in this kitchen sink package, but I'm not sure which.
Hi @TomAugspurger A few questions: When you say Xarray-adjacent, do you mean the optional dependencies listed here?
What are the main advantages of keeping the list small? Is that it makes opening the binder faster? Or less likely to run into conflicts?
Personally nc-time-axis would to useful, but I'm not sure how widely its needed.
From my perspective as someone who is very new to pangeo (and python actually) and who is spreading the word to colleagues with no prior knowledge, there is a big advantage in being able to send someone a link and the binder start up without the need to installing any extra packages. This is an obvious point I guess, but I thought I would emphasize it from my novice point of view.
Personally nc-time-axis would to useful, but I'm not sure how widely its needed.
+1 for this one. In the Ocean and Climate modelling world, non-standard calendars are the real standard.
When you say Xarray-adjacent, do you mean the optional dependencies listed here?
I didn't have any specific ones in mind.
there is a big advantage in being able to send someone a link and the binder start up without the need to installing any extra packages.
Agreed. I think we can be fairly broad with what ends up in the "kitchen sink" pangeo-notebook docker image.
As part of the ocean.pangeo.io fixup, we're looking to remove the environment build step for deployments in pangeo-cloud-federation and just use the pangeo-notebook
docker image.
I went through ocean's environment.yaml
. Of the packages there and not in
the pangeo-notebook
image, there are three main types of packages
- compliance-checker
- ciso
- cc-plugin-ncei
- ctd
- geolinks
- gridgeo
- ioos-tools
- pocean-core
- podaccpy
- retrying
- unyt
- utide
- xlrd
- fiona
- ipython
- netcdf4
- setuptools
- ciso
- fastjmd95
- nc-time-axis
- netcdf4
- pyarrow
- xcape
- xlayers
- xmitgcm
My proposal is to add the actually useful packages to pangeo-notebook.
There are also a few borderline ones:
Does anyone have objections to adding that list of "useful" packages to pangeo-notebook
?
My proposal is to add the actually useful packages to pangeo-notebook.
:+1: in general...But this list seems to be incomplete. I would say that these are the important ones:
- ciso
- xgcm
- xrft
- xhistogram
- xlayers
- xcape
- git+https://github.com/xgcm/fastjmd95.git
- git+https://github.com/NCAR/intake-esm.git
These should be upstreamed to pangeo-notebook
- pyarrow
- netcdf4
- nc-time-axis
Sorry, I missed these when copy-pasting
I don't see xrft
or xhistogram
in ocean.pangeo.io's environment. But I can add them.
intake-esm
is the pangeo-notebook environment already, but it's installed from conda-forge rather than GitHub. If it's OK I'd prefer to push on projects to issue releases, and only install from source as necessary. Likewise for fastjmd95 (though it's not currently in pangeo-notebook).
I don't see
xrft
orxhistogram
in ocean.pangeo.io's environment.
They have definitely been there in the past!
If it's OK I'd prefer to push on projects to issue releases, and only install from source as necessary.
:+1:
Likewise for fastjmd95 (though it's not currently in pangeo-notebook).
I'm going to release it now.
fastjmd95 now released on pip. Is conda vs. pip important?
It'll be the only one not from conda-forge, but it doesn't matter too much for a pure-python package I think. We can move it if / when it becomes available on conda-forge.
On Mon, Jun 22, 2020 at 2:41 PM Ryan Abernathey notifications@github.com wrote:
fastjmd95 now released on pip. Is conda vs. pip important?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-docker-images/issues/28#issuecomment-647732348, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITCDY52LLBORIZHGUDRX6XVHANCNFSM4LW5AYNA .
Thanks @TomAugspurger and @rabernat for pushing this forward. I think there is a lot of value in using exactly the same image across cloud deployments, and that is the current intention with pangeo-notebook
. I'm also wary of large image size and troubleshooting inevitable package conflicts as the list of desirable packages grows. For example, here is the list of packages used by request during the recent icesat2 hackweek https://github.com/ICESAT-2HackWeek/jupyter-image-2020/blob/master/environment.yml.
These should be upstreamed to pangeo-notebook
in my opinion the current meta-package should include just the minimum set of packages to launch a dask gateway cluster and connect to the labextension dashboard. We should consider whether it's worthwhile creating additional meta-packages and/or renaming things to make this more obvious (such as pangeo-notebook
--> pangeo-ui or pangeo
? and then separately you could have a pangeo-analysis
or pangeo-ocean
metapackage).
If image building is dropped from pangeo-cloud-federation, it's also possible to include additional domain or hub-specific images (e.g. hub-aws-uswest2
, hub-gcp-uscentral1b
) in this repository and refactor the CI to build images independently.
What packages belong in a "default" pangeo metapackage? Currently
pangeo-notebook
has essentially dask + jupyterhub + jupyterlab. https://github.com/conda-forge/pangeo-notebook-feedstock/blob/master/recipe/meta.yaml. IMO, there's value in having a minimal metapackage.There's also value in a "useful" metapackage that includes things like
In
pangeo-stacks
we called thispangeo-notebook
: https://github.com/pangeo-data/pangeo-stacks/blob/a8cf6aefa36800301977390a785d06edac9b915e/pangeo-notebook/binder/environment.yml.Also, what should we call this? Perhaps just
pangeo
?cc @rabernat @scottyhq @jhamman