opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
493 stars 175 forks source link

Regression with new database connection handling/transaction isolation #1555

Open omad opened 4 months ago

omad commented 4 months ago

Background

Digital Earth Australia provides a Python Environment on the NCI which includes ODC. It's currently running the very old version 1.8.8 under Python 3.8.13 and we want to upgrade to the latest version. We've created a new environment using ODC core version 1.8.17 running under Python 3.10.8, but in testing we've been getting errors which weren't occuring before.

This issue was reported and discussed internally in Jira Issue DSNS-304, and included code to reproduce the error.

Error

The problem was reported as an unhandled PostgreSQL exception being thrown in a production script, which used to work cleanly.

Original code and stack trace from error When executing the following script with the new `module load dea/20231204` ```python #!/usr/bin/env python3 import datacube message = """Will this work? Will this cause the new dea module to error out?""" def main(product="ga_ls8c_ard_3"): dc = datacube.Datacube(app="gen-list") for l1_dataset in dc.index.datasets.search(product=product): with open('io.txt', 'a') as file: file.write(message + '\n') if __name__ == "__main__": main(product="ga_s2am_ard_3") ``` **Stack Trace** ``` Traceback (most recent call last): File "/g/data/u46/users/dsg547/sandbox/ard_pipeline_support/tickets/DSNS-304/example/./odc_call.py", line 18, in main(product="ga_s2am_ard_3") File "/g/data/u46/users/dsg547/sandbox/ard_pipeline_support/tickets/DSNS-304/example/./odc_call.py", line 14, in main for l1_dataset in dc.index.datasets.search(product=product): File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/index/postgres/_datasets.py", line 548, in search for product, datasets in self._do_search_by_product(query, File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/index/postgres/_datasets.py", line 691, in _do_search_by_product with self._db_connection() as connection: File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/index/postgres/_transaction.py", line 60, in _db_connection with self._index._active_connection(transaction=transaction) as conn: File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/index/postgres/index.py", line 157, in _active_connection with self._db._connect() as conn: File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/drivers/postgres/_connections.py", line 231, in _connect connection.close() File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1251, in close conn.close() File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1166, in close self._checkin() File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1008, in _checkin _finalize_fairy( File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 800, in _finalize_fairy connection_record.checkin() File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 542, in checkin finalizer(connection) File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 677, in _reset_characteristics characteristic.reset_characteristic(self, dbapi_connection) File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/engine/characteristics.py", line 50, in reset_characteristic dialect.reset_isolation_level(dbapi_conn) File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 748, in reset_isolation_level self.set_isolation_level(dbapi_conn, self.default_isolation_level) File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 851, in set_isolation_level connection.set_isolation_level(level) psycopg2.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. ```

I believe @Ariana-B did some investigation resulting in https://github.com/opendatacube/datacube-core/pull/1525 , especially in relation to the connection.set_isolation_level() changes, but it hasn't yet been resolved.

New Findings

I've run some more tests today, and have more of an idea of what the issue is.

I'm not sure whether the change is ODC or SQLAlchemy or psycopg2, but I suspect it's ODC.

I'm about to write up a proposed improvement in a new issue, to allow passing arbitrary connection parameters to libpq, which will allow working around the TCP idle timeout issue here.

SpacemanPaul commented 4 months ago

As discussed, any connection parameter that can be included in the database URL can already be user-specfied in 1.9 thanks to the configuration layer changes from EP10.