rustprooflabs / pgosm-flex

PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
MIT License
101 stars 20 forks source link

No safety against overwriting existing data #313

Closed rustprooflabs closed 1 year ago

rustprooflabs commented 1 year ago

What version of PgOSM Flex are you using?

PgOSM Flex 0.8.0.dev.1, Docker image

What did you do exactly?

Followed instructions to use an external connection with replication. After running a few updates, running the following command will start the process of overwriting.

docker exec -it     pgosm python3 docker/pgosm_flex.py     --ram=8     --region=north-america/us     --subregion=colorado

What did you expect to happen?

When running w/out --replication, it would probably be a good idea to check osm.pgosm_flex table for any records and require --force.

What did happen instead?

It started a complete fresh import. This could be a very problematic issue on large regions / busy instances where the difference between running a diff and an full import could be a major impact.

rustprooflabs commented 1 year ago

I looked at defining logic of when to require using --force, it turned out to be simpler to define when --force would not be required.

I worked out three (3) scenarios where using --force should not be required, this list seems to cover the typical, safe usages of PgOSM Flex. The following query is expected to be used.

SELECT id, osm_date, region, layerset, import_status,
        import_mode ->> 'replication' AS replication,
        import_mode ->> 'update' AS use_update,
        import_mode
    FROM osm.pgosm_flex
    ORDER BY imported DESC
    LIMIT 1
;

Scenarios

No import

When the query above returns 0 rows, aka no prior import.

Using Replication normally

The --replication flag used && prior import exists (1 row from query) && Prior import used --replication

This scenario assumes the replication process "figures it out."

Using --update=append

When --update=append is used and prior import exists, where use_update column is not null.

(I'm the least confident about the specifics of this one...)

rustprooflabs commented 1 year ago

Merged into main via #328