osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.48k stars 473 forks source link

osm2pgsql-replication returns code 0 when output says "Error during diff download" #2206

Closed rustprooflabs closed 2 months ago

rustprooflabs commented 2 months ago

What version of osm2pgsql are you using?

2024-07-06 09:58:11  osm2pgsql version 1.11.0
Build: RelWithDebInfo
Compiled using the following library versions:
Libosmium 2.20.0
Proj 7.2.1
Lua 5.4.2

What operating system and PostgreSQL/PostGIS version are you using?

Debian, Postgres 16, PostGIS 3.4 through the postgis/postgis:16-3.4 Docker image.

Tell us something about your system

Running Docker on my laptop, 12 cores, 64 GB RAM.

What did you do exactly?

I attempted to update an old dev server's data using osm2pgsql-replication when the target database was too old to be updated from Geofabrik's download server. It had last been updated about 4 months ago, when Geofabrik's replication files are only available for ~ 90 days.

The command involved:

osm2pgsql-replication update -d postgresql://pgosm_flex:notyourpassword@172.16.0.170:5432/pgosm_dev?application_name=pgosm-flex \
    -- \
    --output=flex --style=./run.lua \
    --slim

What did you expect to happen?

I expected osm2pgsql-replication to exit with an return code other than 0. This would allow my code to detect the issue and report it properly. Based on the current osm2pgsql-replication docs it could potentially be considered a 3 (network error). I don't have a strong opinion if that is correct return code other than it should not be 0.

What did happen instead?

The return code was 0, even though the update failed. The text output does correctly represent that there was an error.

[ERROR]: Error during diff download. Bailing out.

My project uses this return code to determine if this step worked properly. I created an issue in that project (https://github.com/rustprooflabs/pgosm-flex/issues/391) that documents a bit more of the detail. Downstream code relies on this output code to detect and report success/failure of the process. The approach I took to catching this scenario in my project was to add a step to parse of output lines to catch the error text (see https://github.com/rustprooflabs/pgosm-flex/pull/392). That does not feel like the right way to detect this problem.

What did you do to try analyzing the problem?

I searched this project in GitHub looking the text "Error during diff download. Bailing out." and was unable to find where it comes from. I checked osmium-tool as well thinking it might be there and also did not find it.

lonvia commented 2 months ago

pyosmium considers all errors regarding the diff download as "transient". Meaning it considers it worth a try later. That's why it returns a 0. I can see how this is not helpful in this case.

Needs improvements in pyosmium.

lonvia commented 2 months ago

With #2212 you will get an error with the Geofabrik diffs, when you initialize replication on an old extract. This should be sufficient for most cases.

You'd still run into problems when you have already initialised replication and then not updated for two months. As said above, this needs fixing in pyosmium. I've opened https://github.com/osmcode/pyosmium/issues/257 for that.

Closing here, as there is nothing more in osm2pgsql we can do.

rustprooflabs commented 2 months ago

Thank you for the quick action! I'll test with #2212 and see how that handles my scenario.