opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
495 stars 175 forks source link

Improve GitHub Actions Testing Workflows - Stop Using Docker #1589

Open omad opened 1 month ago

omad commented 1 month ago

We currently use Docker Builds inside GitHub actions for running automated tests. It "works", but is very complicated and very fragile, as recently evidenced by multiple days of work on #1587 and #1588.

The world has moved on since this workflow was created, and it's now possible to get up to date binaries of all the geospatial and scientific Python libraries that we use. From several different sources. Without having to depend on system libraries, or recompile anything.

As far as I can tell. The docker images built from datacube-core aren't being used anywhere, it's only downstream software like Alchemist, Statistician, Tools, OWS and Explorer where docker images are used. They've all ended up going in somewhat divergent directions, due to their differing dependencies and requirements. While consolidation would be good, I don't think the setup in GHA here would be a sensible place to start.

In brief, I think we should:

I think it will be much faster, and simpler, and more comprehensive

pjonsson commented 4 weeks ago

The linked PRs are making 3 upgrades at once:

  1. Ubuntu LTS version upgrade (happens once every two years)
  2. Python version from 3.10 to 3.12 (one 3.x release per year, so two years worth of upgrades)
  3. GDAL version from 3.8 to 3.9 (happens frequently, GDAL moves fast)

This time it was GDAL that mandated all upgrades at once, and switching to running directly on the Github CI runners will shift the thing mandating the upgrades of Ubuntu LTS to Github instead of GDAL, but it will still be an external decision.

The third bullet goes against psycopg2's recommended production use (https://www.psycopg.org/docs/install.html#psycopg-vs-psycopg-binary).

Having the database in a Github service will give a different local test environment, so that might reduce the reproducibility of CI issues compared to a containerized setup.

I'm not advocating any certain technical direction, just that the causes, consequences, and costs of directions are considered.

Kirill888 commented 1 week ago

current docker based workflow is really slooow. Looks like it's building docker from scratch, without any cache, it then seems to push it to ghcr.io just to download it right back in the next step spending couple minutes on each side. We should not need to push and pull at least, even if we keep building from the very scratch without cache. Or better just stop using docker where it hurts rather then helps.