locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

docs build dependency stuff #528

Open vpipkt opened 3 years ago

vpipkt commented 3 years ago

The docs build is now broken due to ModuleNotFound errors because dependencies are not installed by default in the current build. The build will install some dependencies via conda (formerly pip) with a requirements file, sbt package the wheel, and run setup.py with the pweave command.

A bit of history

From 8db45241..07bd8ce8 we had the following change

--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -42,7 +42,8 @@ orbs:
         steps:
           - run:
               name: Install requirements
-              command: python -m pip install --progress-bar=off --user -r pyrasterframes/src/main/p
ython/requirements.txt
+              command: /opt/conda/bin/conda install -c conda-forge --yes --file pyrasterframes/src/
main/python/requirements-condaforge.txt
+

The requirements.txt file was deleted but contained the following lovelies which were also repeated, not quite verbatim, in our setup.py:

-ipython==6.2.1
-pyspark==2.4.7
-gdal==2.4.4
-numpy>=1.17.3,<2.0
-pandas>=0.25.3,<1.0
-shapely>=1.6.4,<1.7
-rasterio>=1.1.1,<1.2
-folium>=0.10.1,<0.11
-geopandas>=0.6.2,<0.7
-descartes>=1.1.0,<1.2
-pytz
-matplotlib
-rtree
-Pillow
-deprecation

The requirements-condaforge.txt that is now used in the circle CI build is:

--- /dev/null
+++ b/pyrasterframes/src/main/python/requirements-condaforge.txt
@@ -0,0 +1,4 @@
+# These packages should be installed from conda-forge, given their complex binary components.
+gdal==2.4.4
+rasterio[s3]
+rtree

Root cause

Now what we are seeing is missing deprecation and shapely modules in the circle CI docs job. Basically anything that is not in the setup_requires in setup.py could result in a ModuleNotFound error. Because sbt is running python setup.py pweave it does not install or source install dependencies in that task. Stop me if you have heard this before.

Basically there is tension between managing the dependencies for installation and what dependencies we want to run the pweave command in setup.py.

Proposed solution:

Refactor the PweaveDocs class out of setup.py into a standalone script perhaps pyrasterframes/src/main/python/docs/build.py. In the proposed build.py, no longer extend distutils.cmd.Command. For desired command line option parsing, use something like argparse instead of the Command's pattern. As far as I can tell, there is not much else of the Command that we are currently taking advantage of.

Circle docs job should:

  1. conda install dependencies from requirements file (stays the same)
  2. call sbt (mild refactoring of current build definition)
    1. sbt package to build assembly jar and whl
    2. sbt test/compile
  3. run python -m pip install pyrasterframes/target/python/ this is important and new
  4. sbt makeSite -- mostly the same
    1. refactor the current pySetup task to instead run python pyrasterframes/target/python/docs/build.py with desired options such as quick

This will enable us to:

  1. maintain the conda requirements file as basically our set of recommended pre-requisites due to the complicated binaries
  2. remove several packages from setup_requires because setup no longer requires them
  3. continue to use install_requires for managing those dependencies

alternative?

If there is a way for us to explicitly declare that a distutils Command should extend or depend upon the install command itself, this could simplify the specification of the circle job.

vpipkt commented 3 years ago

changing direction from the above a bit: use python setup.py develop task inside the python/docs definition.

See #529