opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
504 stars 176 forks source link

Discussion: psycopg2 dependency #1614

Open omad opened 1 month ago

omad commented 1 month ago

I've been regularly bitten having pip install datacube fail due to the hard dependency on psycopg2.

This dependency forces a source install of psycopg2, which fails any time gcc/python headers/libpq-dev isn't available in the environment. (Often the case in Nix or Docker environments).

The solution I've been considering is moving psycopg2 to being an optional dependency. This would let it to be provided by either psycopg2 installed from source, OR from the psycopg2-binary wheel being installed.

This is also important since we now have multiple ODC index drivers, and having a postgres driver is not even be required for a working ODC install.

We could still optionally require psycopg2 with something like:

  pip install datacube[postgres]

Although I'm not sure whether that should install psycopg2 or pscopg2-binary.

I am wary of support issues. As well as changing setup.py, we'd need to update documentation for the different cases, and could even consider catching the missing dependency in code, if for example the postgres or postgis driver was loaded with psycopg2 being available.

Related earlier issue: https://github.com/opendatacube/datacube-core/issues/1030

Also: psycopg3 is something we should be considering too. It's supported by SQLAlchemy 2, so hopefully isn't too difficult to support. There's only a couple of direct usages of psycopg2 in the code:

datacube/drivers/postgis/_fields.py
15:from psycopg2.extras import NumericRange, DateTimeTZRange
SpacemanPaul commented 1 month ago

I think we should be able to do this in 1.9.

pjonsson commented 1 month ago

I agree that it's a pain to make the build container and copy the built wheel over, but the recommended production use for psycopg2 is to use the source package and build it, not the binary wheel (https://www.psycopg.org/docs/install.html#psycopg-vs-psycopg-binary). If the psycopg variant of datacube uses the psycopg2-binary, I think it would be nice to have a psycopg-production variant that refuses the psycopg2-binary so we get a clear error when things are broken. Another option is to use the non-binary package for psycopg2 variant, and call the variant that uses psycopg2-binary for psycopg2-binary.

For what it's worth: I recently updated a much smaller SQLAlchemy-using code base without direct uses of psycopg2 to psycopg3, and there was a single issue to fix where psycopg3 required some type annotation when using some GeoAlchemy2 function, beyond that it was just a update of the engine connection string. I first tried it ~15 months ago, and I'm fairly sure there were more issues then, but it's possible replacing some text() uses with using the ORM eliminated some issues (or GeoAlchemy2/SQLAlchemy improvements during those 15 months).

robbibt commented 1 month ago

Just tagging @caitlinadams here as I believe psycopg2 dependancy in odc-algo / odc-tools has been a major blocker to some of her recent work. There's also some more discussion here: https://github.com/opendatacube/odc-algo/issues/7

And here: https://github.com/opendatacube/datacube-core/discussions/1566#discussioncomment-8861553