rustprooflabs / pgosm-flex

PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
MIT License
100 stars 20 forks source link

Replication failure mis-reports as success in DB #391

Closed rustprooflabs closed 2 months ago

rustprooflabs commented 2 months ago

What version of PgOSM Flex are you using?

1.0.1-e53f4cf via Docker. Current latest dev branch.

What did you do exactly?

I attempted updating a database with replication that hadn't been updated since 2/10/2024. Env vars are stored in ~/.pgosm-db-pgosm-dev.

source ~/.pgosm-db-pgosm-dev

docker run --name pgosm -d --rm \
    -v ~/pgosm-data:/app/output \
    -v /etc/localtime:/etc/localtime:ro \
    -e POSTGRES_USER=$POSTGRES_USER \
    -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
    -e POSTGRES_HOST=$POSTGRES_HOST \
    -e POSTGRES_DB=$POSTGRES_DB \
    -e POSTGRES_PORT=$POSTGRES_PORT \
    -p 5433:5432 -d rustprooflabs/pgosm-flex

docker exec -it \
    pgosm python3 docker/pgosm_flex.py \
    --ram=8 \
    --region=north-america/us \
    --subregion=district-of-columbia \
    --replication

What did you expect to happen?

I expected it to outright fail because it has been too long since a replication update. Prior experience shows that 90 days is as long as Geofabrik replication data is retained, so it fails.

I expect the record in osm.pgosm_flex to indicate failure in the import_status column.

What did happen instead?

The log output claims success at the end with the message PgOSM Flex complete!. You have to look closely a few lines up to see it truly failed: [ERROR]: Error during diff download. Bailing out.

...

2024-07-06 07:13:27,557:INFO:pgosm-flex:helpers:2024-07-06 07:13:27 [INFO]: Using replication service 'http://download.geofabrik.de/north-america/us/district-of-columbia-updates'.
2024-07-06 07:13:28,759:INFO:pgosm-flex:helpers:2024-07-06 07:13:28 [ERROR]: Error during diff download. Bailing out.
2024-07-06 07:13:29,803:INFO:pgosm-flex:db:Finishing Replication, including nested polygons
2024-07-06 07:13:29,803:INFO:pgosm-flex:db:Finishing Replication
2024-07-06 07:13:29,951:INFO:pgosm-flex:db:Finishing replication output: 
 NOTICE:  Populating nested place table
NOTICE:  Calculating nesting of place polygons
NOTICE:  Rows to update: 29
NOTICE:  Updating in batches of 100 rows
NOTICE:  table "places_for_nesting" does not exist, skipping
NOTICE:  table "place_batch" does not exist, skipping
CALL

2024-07-06 07:13:29,951:INFO:pgosm-flex:pgosm_flex:osm2pgsql-replication update complete
2024-07-06 07:13:29,980:INFO:pgosm-flex:pgosm_flex:Skipping pg_dump
2024-07-06 07:13:29,981:INFO:pgosm-flex:pgosm_flex:PgOSM Flex complete!

The import_status incorrectly claims this was "Completed."

SELECT imported, osm_date, pgosm_flex_version, import_status
    FROM osm.pgosm_flex
    ORDER BY imported DESC
    limit 2
;
imported                     |osm_date  |pgosm_flex_version|import_status|
-----------------------------+----------+------------------+-------------+
2024-07-06 07:13:27.330 -0600|2024-07-06|1.0.1-e53f4cf     |Completed    |
2024-02-10 09:16:55.163 -0700|2024-02-10|0.10.3-3224da2    |Completed    |

What did you do to try analyzing the problem?

I have prior experience with this problem of long replication time frames. I confirmed things aren't working properly based on the error message in log output.

rustprooflabs commented 2 months ago

A couple prints in the pgosm_flex.py module shows that the code here is working as anticipated. The return code from osm2pgsql-replication is 0 when the output is clearly stating an error. An extract of the output showing the additional prints of the exact command being ran and the return code.

2024-07-06 07:34:14,146:INFO:pgosm-flex:helpers:2024-07-06 07:34:14 [INFO]: Using replication service 'http://download.geofabrik.de/north-america/us/district-of-columbia-updates'.
2024-07-06 07:34:15,419:INFO:pgosm-flex:helpers:2024-07-06 07:34:15 [ERROR]: Error during diff download. Bailing out.
Update Command: 
osm2pgsql-replication update -d postgresql://pgosm_flex:mysecretpassword@172.16.0.170:5432/pgosm_dev?application_name=pgosm-flex     --     --output=flex --style=./run.lua     --slim

RETURNCODE: 0

I searched https://github.com/osm2pgsql-dev/osm2pgsql for the text in the error ("Error during diff download. Bailing out.") without luck. Unsure exactly where the error message is generated from. Will push a workaround fix for this for the time being.