osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.12k stars 715 forks source link

Possibile wrong initial import for postcodes #3180

Closed emandtf closed 1 year ago

emandtf commented 1 year ago

Describe the bug I'm using latest Nominatim 4.2.3 and choosed to import all Italy database from official OSM repository using even Wiki importance data. I discovered that "postcode" field of "placex" table contains a lot of wrong data.

To Reproduce

All _osmid in ordered sequence (from Street up to Region) are 76501249, 2282532051, 41833, 167044, 53937:

  1. https://www.openstreetmap.org/way/76501249
  2. https://www.openstreetmap.org/node/2282532051
  3. https://www.openstreetmap.org/relation/41833
  4. https://www.openstreetmap.org/relation/167044
  5. https://www.openstreetmap.org/relation/53937

As you can see by clicking on those links, no anyone of them has a "postcode" tag attached nor anything about it but at the first two of them a "67029" postcode is associated, but it's wrong! The right one should be 67020. You can see ordered "lookup" of those 5 osm_ids in following/attached image and even the WRONG "postcode" field at the most right. 20230829_nominatim_postcode_bug

Wikidata of "Q47070" (which is present in "extratags" field: https://www.wikidata.org/wiki/Q47070) has the correct postalcode, so the data should be taken from somewhere else....but I don't figure out from where.

So I suppose that the Import procedure is somehow bugged about the postcode part.

Software Environment (please complete the following information):

Hardware Configuration (please complete the following information):

mtmail commented 1 year ago

On the nominatim.openstreetmap.org servers, which run a version newer than 4.2.3, I see postcode 67029. Note the 'how?' help link that tries to explain how postcodes are calculated. https://nominatim.openstreetmap.org/ui/details.html?osmtype=W&osmid=76501249

Looking inside the tables can help but there's easier ways to see the address hierarchy of a place:

mtmail commented 1 year ago

OpenStreetMap data contains only few postcodes in that area, none for Acciano for example. Nominatim has to guess what the postcode of that road is.

https://overpass-turbo.eu/s/1zx3 image

emandtf commented 1 year ago

/details.php?osmtype=W&osmid=76501249&addressdetails=1&format=json

Yes, in your url it's written "_calculatedpostcode: 67029" because I suppose "placex.postcode" field is filled by Nominatim during the Import even if no any record has its own "postcode" in extraflag or address JSON, and /details URL reads from there which is effectly a calculated one.

emandtf commented 1 year ago

Nominatim has to guess what the postcode of that road is.

Thank you for the explanation. I just was starting thinking that after got this issue.

So it's not possible to rely on any postcode from Nominatim. Is it possibile to disable the "guessing" procedure during Import? I prefer to not have postcode at all if Nominatim should guess it because in this way any Search using Nominatim Engine or by trying to extract data using Queries could get me wrong results very often.

Doing a specific Query on Nominatim DB shows that 90% of italian Hamlets have associated more than one postcode which is not possibile (only few of them in very big cities of whole 8000 available hamlets), so extracting an "Hamlet - Postcode" association data it's not possibile due to wrong data.

PS: what about using Wikidata postcode when available? It could cover most of this issues ;)

mtmail commented 1 year ago

That logic is deep inside the import logic and can't be disabled. The "raw" postcode data is in the placex.address columns. So for example for https://www.openstreetmap.org/way/845367216 you'll see "city"=>"Civitaretenga", "street"=>"Via Risorgimento", "postcode"=>"67020"

In your database you could delete all data in placex.postcode and then fill it again from the placex.address column, about 2.3 million places in Italy have a postcode attached.

so extracting an "Hamlet - Postcode" association data it's not possibile

Postcodes are for addresses (houses), trying to find one postcode for a village or city is often not precise. That's not how the postal companies usually work, they assign their own boundaries based on how their couriers deliver the mail best. It doesn't direct map to the political or administrative boundaries.

Geocoders try to find a balance between precise data (one postcode attached to an address or building) vs calculated areas (e.g. https://en.wikipedia.org/wiki/Voronoi_diagram but Nominatim doesn't do that). Only few countries have open postal code boundaries (Germany is an example https://www.openstreetmap.org/relation/3359835). Nominatim is slowly improving https://nominatim.org/2022/06/26/state-of-postcodes.html but we still deal with incomplete data.

You might have to use government data or licensed data from the Italian postal code company. https://www.digiatlas.com/mapas/ang/italy-zip-codes-map-with-demographic-data.html

emandtf commented 1 year ago

You might have to use government data or licensed data from the Italian postal code company

I'ts an Open Data, so it could be freely used but it should be formatted in some specific way (which I don't know) and/or be manipulated/processed by some software to be used/written in the right place in Nominatim DB.....and it could be not so simple.

However thank you again for all of your info. I much appreciated it.