natcap / invest

InVEST®: models that map and value the goods and services from nature that sustain and fulfill human life.
Apache License 2.0
166 stars 68 forks source link

ImportErrors with gdal builds from conda-forge #96

Closed davemfish closed 4 years ago

davemfish commented 4 years ago

Tests that were passing days ago are now crashing simply when re-running the same Actions jobs.

Failure: ImportError (dlopen(/usr/local/miniconda/envs/env/lib/python3.7/site-packages/osgeo/_gdal.cpython-37m-darwin.so, 2): Symbol not found: _crc32
2020-04-27T19:19:09.6576590Z   Referenced from: /usr/local/miniconda/envs/env/lib/libgdal.20.dylib
2020-04-27T19:19:09.6576790Z   Expected in: /usr/local/miniconda/envs/env/lib/libcfitsio.8.dylib
2020-04-27T19:19:09.6695650Z  in /usr/local/miniconda/envs/env/lib/libgdal.20.dylib) ... ERROR

This is blocking #95

davemfish commented 4 years ago

Here's an example: https://github.com/davemfish/invest/runs/623806938?check_suite_focus=true

phargogh commented 4 years ago

Well that is strange! Seems possible that there's an upstream build issue for these packages, but in looking through the build logs, I also wonder if this is exposing some other issue in our installation. @davemfish in the build linked above, the tests are failing on the test that's supposed to be running things on python 3.6, and yet the error has to do with python 3.7:

Failure: ImportError (dlopen(/usr/local/miniconda/envs/env/lib/python3.7/site-packages/osgeo/_gdal.cpython-37m-darwin.so, 2): Symbol not found: _crc32

Is it possible that we're just not activating the environment or something like that, which maybe is leading to the import error?

davemfish commented 4 years ago

@phargogh , thanks for noticing that! I think you're on to something there. We had a rogue insertion of a python version requirement (3.7) in scripts/convert-requirements-to-conda-yml.py.

I think the previous use case for that script -- on the office mac mini -- was creating & updating the conda env all in one go, from a yml file. Here in Actions, we already have the env created with the python version we want, and just need the yml file for installing all the other deps.

Let's see if it works: https://github.com/davemfish/invest/actions/runs/89568734

davemfish commented 4 years ago

We had a rogue insertion of a python version requirement (3.7) in scripts/convert-requirements-to-conda-yml.py.

Well that wasn't helping anything, but does not seem to relate to the issue here. But it did reveal a different problem with the 3.6 tests related to c++ header files and/or flags.

The python 3.7 tests crashed in the same way as illustrated above.

https://github.com/davemfish/invest/runs/624157861?check_suite_focus=true

dcdenu4 commented 4 years ago

@davemfish so the 3.6 test is failing during installing dependencies:

src/natcap/invest/recreation/out_of_core_quadtree.cpp:834:14: fatal error: 'complex' file not found
    #include <complex>
             ^~~~~~~~~
2 warnings and 1 error generated.
error: command 'gcc' failed with exit status 1

And the 3.7 test is failing on run model tests:

ImportError: dlopen(/usr/local/miniconda/envs/env/lib/python3.7/site-packages/osgeo/_gdal.cpython-37m-darwin.so, 2): Symbol not found: _crc32
  Referenced from: /usr/local/miniconda/envs/env/lib/libgdal.20.dylib
  Expected in: /usr/local/miniconda/envs/env/lib/libcfitsio.8.dylib
 in /usr/local/miniconda/envs/env/lib/libgdal.20.dylib

Is there a way to poke pygeoprocessing and see if that is still building and passing tests?

Also just seeing what that script is doing that is compiling all the requirement files. It makes me a little uneasy that we'd be re-defining channels to something different then how setup-miniconda is handling it's channels...

dcdenu4 commented 4 years ago

I'm also wondering if that conda env update call is doing what we think it's doing:

https://stackoverflow.com/questions/42352841/how-to-update-an-existing-conda-environment-with-a-yml-file

We might be creating or updating the base env, since we are not specifying the env: env that we activated with setup-miniconda. Maybe we should name that setup-miniconda env something different like: macosvenv, so we don't get confused, haha

phargogh commented 4 years ago

@davemfish I'm waiting for tests/builds to check out before submitting a PR for this, but here's a patch that includes the compiler flags that we added to https://github.com/natcap/pygeoprocessing/issues/9:

catalina.patch.txt

davemfish commented 4 years ago

@dcdenu4 it looks like the same issues going on for pygeoprocessing. This is a build from the latest on the release/2.0 branch, which is the branch that uses the same setup-miniconda as here in invest.

https://github.com/davemfish/pygeoprocessing/runs/626835097?check_suite_focus=true

In pygeoprocessing, MacOS tests crash with same "symbol not found" as we see here. And the Windows setup also uses conda, and also crashes. Ubuntu passes.

davemfish commented 4 years ago

Prior to the above-mentioned pygeoprocessing build, I triggered a build from the latest master, which still uses s-weigand/setup-conda@v1.0.0 and things pass there.

https://github.com/davemfish/pygeoprocessing/actions/runs/90341515/workflow

davemfish commented 4 years ago

Besides the different pre-built actions used in those two pygeoprocessing examples, release installs gdal version 3 and master installs 2.3.3. Here in invest we install 2.4.4

davemfish commented 4 years ago

Others are now suggesting this a conda-forge issue. https://github.com/conda-forge/gdal-feedstock/issues/382

dcdenu4 commented 4 years ago

@davemfish neat! We're part of a larger community based bug! Is it best to sit on this for now?

davemfish commented 4 years ago

@dcdenu4 I'm thinking it's probably best to sit on it, but that's just my opinion. We still have passing Windows tests & builds. @phargogh suggested pinning gdal to a version with a working conda-forge build, but that also then needs to be a version for which we have a Windows wheel available on natcap's package index.

phargogh commented 4 years ago

I'd suggest we sit on it for now.

In the longer-term, I do think we should consider what changes we can make to allow us to be resilient to these sorts of upstream build failures. In truth, it seems like these issues happen occasionally on any package management infrastructure (except maybe on debian:stable). @davemfish remember when there was a malfunctioning rtree build on PyPI? Or there was an issue with a gohlke shapely build that one time? How might we address issues like those?

davemfish commented 4 years ago

Definitely agree it's worth spending some time designing a more stable system for ourselves, and/or riding out these problems when they pop up instead of immediately dumping hours into resloving them.

I do remember those past issues. And the one we currently face is due to a bad build of cfitsio, which I'd never heard of before and I'm fairly certain we don't actually need a gdal build `--with fitsio, but we're depending on one anyway.

phargogh commented 4 years ago

@davemfish @dcdenu4 looks like the underlying build issue with cfitsio has been resolved in https://github.com/conda-forge/cfitsio-feedstock/pull/26. I've retriggered some of my own builds on my fork and tests are executing as expected!

davemfish commented 4 years ago

Yes, can confirm that!