komoot / photon

an open source geocoder for openstreetmap data
Apache License 2.0
1.83k stars 278 forks source link

Fix updates for objects with housenumbers #773

Closed lonvia closed 4 months ago

lonvia commented 4 months ago

There has been a long-standing issue that updates of places with housenumbers as well as housenumber interpolation objects do not work properly. These places are added to the Photon database with a special database ID <place_id>.<housenumber> in order to allow multiple Photon objects for the same Nominatim place_id. This works fine on import but goes subtly wrong when doing updates, because update have only the information about the new state of a place, not the old one. Thus, it is not really possible to delete the old data for such a place because we don't know what database ID to look it up under.

This PR changes the database ID for such objects to <place_id>.<seq_nr>. When a place is inserted that needs to be exploded into multiple Photon documents with different housenumbers, they are simply assigned with a sequential ID. As a place is always updated as a whole, we can now simply delete all documents matching the pattern <place_id>.<seq_nr> by sequentially checking if there is such document in the database. If there is, delete it, if not, stop the entire process.

The change do not modify the database schema, so the code happily works with older database dumps. Only when you want to make use of the fixed update function, then you need to start off with a new dump created by this new code or you will see duplicate housenumbers creep into your database.

Tried on a planet to update the database and was able to catch up on OSM data at a rate of about 1day/hour (updating both, the Nominatim DB and the Photon DB). This should be sufficient performance-wise.

The PR also finally adds tests for the update process and fixes an off-by-one error in the handling of new-style interpolations.