holoviz-topics / EarthSim

Tools for working with and visualizing environmental simulations.
https://earthsim.holoviz.org
BSD 3-Clause "New" or "Revised" License
65 stars 21 forks source link

Installing the EarthSim stack #168

Closed ocefpaf closed 5 years ago

ocefpaf commented 6 years ago

First of all thanks for the awesome tools you are developing here. As more and more Earth scientists try to run the notebooks here they end up asking for the packages to be easier to install.

I noticed that the dependencies.txt is a "safe bet" with many packages pinned to versions that are known to work.

However, things like the comment below:

# Note 2: Since we started using this version, it's been marked
# "broken" (although downloaded >50k times).  However, newest version
# requires boost 1.67.0, while c-f was previously pinned to 1.66.0
# (and 1.66.0 was used for erdc packages). Should be able to revert to
# conda-forge when we update to boost-cpp 1.67.0.

worries me. As the conda-forge maintainer of most of those packages I can say that the reason to move packages to the broken label is b/c they are no longer compatible with the rest of the dependency stack. We don't remove them for the sake of reproducibility but we do not recommend people to keep using them as it is easy to create a broken environment.

At the moment I just gave a demo of the tools here with the following env:

name: mapping
channels:
  - conda-forge
  - pyviz/label/earthsim
  - defaults
dependencies:
  - basemap
  - bokeh
  - cartopy
  - fiona
  - folium
  - geopandas
  - gridgeo
  - ipyleaflet
  - jupyter
  - jupyter_contrib_nbextensions
  - matplotlib
  - palettable
  - python=3.6
  - rasterio
  - shapely
  - xarray
  - xlrd
  # earthsim
  - colorcet
  - datashader
  - descartes
  - earthsim
  - filigree
  - geoviews
  - gssha
  - gsshapy
  - holoviews
  - lancet
  - nodejs
  - opencv
  - param
  - parambokeh
  - paramnb
  - pyct
  - quest
  - scikit-image
  - x264

No pinned dependencies version or channels. Everything worked just fine. My plan is to slowly add the packages from the pyviz/label/earthsim channel to conda-forge to reduce the chance of binary incompatibility due to channel mixing.

Is that something the devs here want to participate? Do you want to be added as maintainers of the conda-forge version of these packages? (We already have some of those BTW.)

Also, feel free to open issues on the conda-forge feedstocks when something is broken. We would like to get those fixed!

rsignell-usgs commented 6 years ago

Thanks for bringing this up, @ocefpaf!

In the context of the Pangeo project, I've been giving a lot of demos of these tools working with large triangular model grids (I've got a 9 million node Hurricane Ike water level simulation), and participation from the Pyviz devs in conda-forge would be fantastic to avoid the crazy environment requirements such as: https://github.com/pangeo-data/helm-chart/blob/master/docker-images/notebook/Dockerfile#L11-L17

ceball commented 6 years ago

I personally would love earthsim to be easier to install, and I am a fan of conda-forge :) I hope the earthsim installation procedure does not stay the way it is for too long. Anyway, someone else will surely answer about the goals of pyviz and earthsim, so I'll just comment on some specific technical points from posts above.

(I'll also add another preface, which is that I'm writing about these packaging-related issues from a total non-packaging-expert stance - only as someone on a team trying to make use of a large number of packages. So what I say is not authoritative, it's just my point of view. I am open to education :) )

At the moment I just gave a demo of the tools here with the following env: [...] No pinned dependencies version or channels. Everything worked just fine.

(a) An environment file like the above does not guarantee - or at least has not guaranteed, historically - that all packages come from conda-forge where possible rather than defaults. E.g. using an environment like the above, we frequently got numpy and scipy from defaults rather than conda-forge, but other packages that depend on them were still coming from conda-forge, leading to binary incompatibilities.

(b) Even if the environment as specified above worked now, I think it wouldn't necessarily work in the future. E.g. when some low-level dependency is updated on conda-forge, but packages that depend on it aren't re-built, the environment will break. Or e.g. if some python package is changed in a non-backwards-compatible way, and conda-forge updates to that package, but packages that depend on it aren't also updated to take account of those changes, things will again break.

(c) If someone installs the above environment and then later on runs e.g. conda install -c conda-forge geopandas because they want a new version of geopandas (just an example), they might find a whole load of packages (including numpy, scipy, opencv, etc) switch to defaults. The way we have it at the moment, the packages in dependencies.txt get pinned to conda-forge, which also affects future operations.

I feel that having a highly pinned, "almost guaranteed to work" standalone environment for a project like earthsim is a good thing to have alongside a far more relaxed option for installing into an existing environment. I also think having conda-forge be "almost guaranteed to work" for earthsim is a great goal (but I believe it's not currently the case). To help with that, we could have a second set of CI tests using an environment like the above, and submit any problems it uncovers as issues and/or PRs to conda-forge feedstocks.

My plan is to slowly add the packages from the pyviz/label/earthsim channel to conda-forge to reduce the chance of binary incompatibility due to channel mixing.

As I mentioned, we've also suffered from binary incompatibility within one channel :(

But anyway, I think the general problem is mixing conda-forge and defaults, rather than mixing pyviz and conda-forge (or pyviz and defaults). The main pyviz channels (label/main and label/dev) contain only noarch:python packages, and packages there are intended to work on top of either defaults or conda-forge (basically the channels are for us to try out our own packages, before we submit the packages to conda-forge and defaults). I.e. the packages we as pyviz develop (and put on the pyviz channel) are already maintained by various pyviz members on conda-forge too.

pyviz/label/earthsim is a bit different, unfortunately. This channel was created recently, specifically to host packages we were having trouble with. E.g. a specific build of x264 was put on pyviz/label/earthsim because the latest version on conda-forge at the time broke conda-forge's opencv. I hope the pyviz/label/earthsim channel is temporary.

ocefpaf commented 6 years ago

(a) An environment file like the above does not guarantee - or at least has not guaranteed, historically

Indeed. This is a major pain point and issue like https://github.com/conda-forge/gdal-feedstock/issues/219#issuecomment-409234067 are very common.

(b) Even if the environment as specified above worked now, I think it wouldn't necessarily work in the future.

That is true but IMO the only way to make it work is to use such envs and fix problems as they appear. In my experience they get more stable, issues surface and get fixed, as we start using them.

(c) If someone installs the above environment and then later on runs e.g. conda install -c conda-forge geopandas because they want a new version of geopandas (just an example), they might find a whole load of packages (including numpy, scipy, opencv, etc) switch to defaults

Another major pain point. There is also some discussion about that here. Not much we can do b/c that is a limitation on conda's side.

I feel that having a highly pinned, "almost guaranteed to work" standalone environment for a project like earthsim is a good thing to have alongside a far more relaxed option for installing into an existing environment.

Highly pinned has its problems too as old packages are no longer ensured to work as the dependencies change. Only a completely frozen env would always be guaranteed to work. In my experience a looser env usually brings less headache when all the packages are in a single channel.

But anyway, I think the general problem is mixing conda-forge and defaults, rather than mixing pyviz and conda-forge (or pyviz and defaults).

Yes. Let's now keep that discussion here b/c it is not an pyviz/EarthSim issue.

packages we were having trouble with. E.g. a specific build of x264 was put on pyviz/label/earthsim because the latest version on conda-forge at the time broke conda-forge's opencv. I hope the pyviz/label/earthsim channel is temporary.

I'll see what I can do to add the missing packages to conda-forge and fix those issues.

ceball commented 6 years ago

That is true but IMO the only way to make it work is to use such envs and fix problems as they appear. In my experience they get more stable, issues surface and get fixed, as we start using them.

I think you're right about that in general - but we needed a concrete thing we could point to right now that's working. It doesn't mean everyone has to install earthsim that way all the time, though. To help with the goal of having conda-forge "just work" for earthsim, I'll set up the suggestion below as soon as I can:

we could have a second set of CI tests using an environment like the above [i.e. everything from latest conda-forge], and submit any problems it uncovers as issues and/or PRs to conda-forge feedstocks.

kcpevey commented 5 years ago

Since #237 has been closed and packaging has changed significantly, I'm going to go ahead and close this also.