rustprooflabs / pgosm-flex

PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
MIT License
101 stars 20 forks source link

Failure when using `--replication` still reports import in `osm.pgosm_flex` table #293

Closed rustprooflabs closed 1 year ago

rustprooflabs commented 1 year ago

What version of PgOSM Flex are you using?

0.7.0.beta.1

Docker image

0.7.0.beta.1

What did you do exactly?

I ran docker exec ... --replication to update a database that was configured with max_connections=50. Per https://github.com/openstreetmap/osm2pgsql/discussions/1650, this failed. The result was a failed update, yet the osm.pgosm_flex table reported the import.

SELECT id, imported, osm_date, pgosm_flex_version,
        osm2pgsql_mode
    FROM osm.pgosm_flex
    WHERE id >= 3
;
id|imported                     |osm_date  |pgosm_flex_version|osm2pgsql_mode|
--+-----------------------------+----------+------------------+--------------+
 3|2022-12-14 19:02:41.425 -0700|2022-12-14|0.6.3-97a1fe3     |create        |
 4|2023-01-16 17:24:37.244 -0700|2023-01-16|0.7.0.rc.1-e4eec24|append        |
 5|2023-01-16 17:26:25.813 -0700|2023-01-16|0.7.0.rc.1-e4eec24|append        |

What did you expect to happen?

The row with id = 4 should either: a) Not exist; or b) Report the failure

id|imported                     |osm_date  |pgosm_flex_version|osm2pgsql_mode|
--+-----------------------------+----------+------------------+--------------+
4|2023-01-16 17:24:37.244 -0700|2023-01-16|0.7.0.rc.1-e4eec24|append        |

What did happen instead?

There is no indication other than the terminal output that anything went wrong.

rustprooflabs commented 1 year ago

I can reproduce with the following, assuming I have the --osm-date available locally.

export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword

docker run --name pgosm -d --rm \
    -v ~/pgosm-data:/app/output \
    -v /etc/localtime:/etc/localtime:ro \
    -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
    -p 5433:5432 \
    -d rustprooflabs/pgosm-flex:0.7.0.rc.1 \
    -c max_connections=100 # 100 is default.  Works for initial import, fails for subsequent import.  See https://github.com/openstreetmap/osm2pgsql/discussions/1650

# Initial import works
docker exec -it \
    pgosm python3 docker/pgosm_flex.py \
    --ram=8 \
    --region=north-america/us \
    --subregion=district-of-columbia \
    --pgosm-date 2022-12-30 \
    --replication

# Subsequent refresh fails
docker exec -it \
    pgosm python3 docker/pgosm_flex.py \
    --ram=8 \
    --region=north-america/us \
    --subregion=district-of-columbia \
    --replication

The second docker exec command fails with

2023-01-18 19:44:27,991:INFO:pgosm-flex:helpers:2023-01-18 19:44:27    Processed 3472 nodes in 0s - 3k/s
2023-01-18 19:44:27,992:INFO:pgosm-flex:helpers:2023-01-18 19:44:27    Processed 1096 ways in 1s - 1k/s
2023-01-18 19:44:27,992:INFO:pgosm-flex:helpers:2023-01-18 19:44:27    Processed 176 relations in 0s - 176/s
2023-01-18 19:44:28,576:INFO:pgosm-flex:helpers:2023-01-18 19:44:28  ERROR: Connecting to database failed: connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  sorry, too many clients already
2023-01-18 19:44:28,576:INFO:pgosm-flex:helpers:.
2023-01-18 19:44:28,581:INFO:pgosm-flex:helpers:Traceback (most recent call last):
2023-01-18 19:44:28,581:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 556, in <module>
2023-01-18 19:44:28,582:INFO:pgosm-flex:helpers:sys.exit(main())
2023-01-18 19:44:28,582:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 550, in main
2023-01-18 19:44:28,582:INFO:pgosm-flex:helpers:return args.handler(conn, args)
2023-01-18 19:44:28,582:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 410, in update
2023-01-18 19:44:28,582:INFO:pgosm-flex:helpers:subprocess.run(osm2pgsql, check=True)
2023-01-18 19:44:28,582:INFO:pgosm-flex:helpers:File "/usr/lib/python3.9/subprocess.py", line 528, in run
2023-01-18 19:44:28,582:INFO:pgosm-flex:helpers:raise CalledProcessError(retcode, process.args,
2023-01-18 19:44:28,582:INFO:pgosm-flex:helpers:subprocess.CalledProcessError: Command '['/usr/local/bin/osm2pgsql', '--append', '--slim', '--prefix', 'planet_osm', '--output=flex', '--style=./run.lua', '--slim', '-d', 'postgresql://postgres:mysecretpassword@localhost:5432/pgosm?application_name=pgosm-flex', '-d', 'postgresql://postgres:mysecretpassword@localhost:5432/pgosm?application_name=pgosm-flex', '/tmp/tmp5ggs5l4u/osm2pgsql_diff.osc.gz']' returned non-zero exit status 1.
2023-01-18 19:44:28,611:WARNING:pgosm-flex:pgosm_flex:Failure. Return code: 1
2023-01-18 19:44:28,611:INFO:pgosm-flex:pgosm_flex:Skipping pg_dump
2023-01-18 19:44:28,611:WARNING:pgosm-flex:pgosm_flex:PgOSM Flex completed with errors. Details in output

The info in osm.pgosm_flex would indicate success :(

SELECT id, imported, osm_date, pgosm_flex_version,
        osm2pgsql_mode
    FROM osm.pgosm_flex
;
id|imported                     |osm_date  |pgosm_flex_version|osm2pgsql_mode|
--+-----------------------------+----------+------------------+--------------+
 1|2023-01-18 19:43:17.405 -0700|2022-12-30|0.7.0.rc.1-e4eec24|create        |
 2|2023-01-18 19:44:26.132 -0700|2023-01-18|0.7.0.rc.1-e4eec24|append        |
rustprooflabs commented 1 year ago

It doesn't look like the run.lua file can easily catch / report this error. It starts with...

-- Loads the `conf` var from layerset INI file
require "layerset"

require "style.pgosm-meta"

if conf['layerset']['amenity'] then
    print('Including amenity')
    require "style.amenity"
end

...

When the whole processing fails, the new row (from the first require) does get added. I don't think there is an easy way for style.pgosm-meta to be aware of the Postgres connection failure, and I don't think we can guarantee that failure will always generate the row incorrectly reporting success.

However, this error should be easy enough to catch.

 ERROR: Connecting to database failed: connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  sorry, too many clients already

Then do a check on the timestamp of the latest row in osm.pgosm_flex to ensure it's within the last 30 seconds. (That's to be paranoid in case there are cases where the new row does not get created. I don't want to update an old meta row to be a failure when a new import failed...)

We'll need a new column in osm.pgosm_flex. Maybe success BOOLEAN. Or, completed TIMESTAMPTZ. The former requires an update on either success or failure (depending on approach). The latter would only need an update on success. If it's NULL the import either a) hasn't completed or b) failed.

Not a trivial fix, not going to try to fix before 0.7.0.