pelias / docker

Run the Pelias geocoder in docker containers, including example projects.
MIT License
338 stars 226 forks source link

North America project - not all addresses have a zip code #202

Closed artemChernitsov closed 4 years ago

artemChernitsov commented 4 years ago

Hello. I have North America build on my own server. Some addresses haven't postal code in the search result, for eg. 1865 John Towers Ave https://pelias.github.io/compare/#/v1/autocomplete?text=1865+john+towers&debug=0, but in the same locality and the same postal code - I have postal code in the search result 2068 Ventana Way https://pelias.github.io/compare/#/v1/autocomplete?text=2068+Ventana+Way&debug=0.

Can you help me understand this point and how I can add all postal code to all my properties in North America project?

Thanks!

missinglink commented 4 years ago

Hi @artemChernitsov,

Postcodes (ZIP codes) are only available when they are in the data we get from the provider. The example address you provided were sourced from http://www.sangis.org/ and didn't come with ZIP code info.

There is no way to simply add all the missing postcodes because they generally do not follow strict geographic boundaries.

artemChernitsov commented 4 years ago

@missinglink thank you for your fast reply. I have a question - if I try to execute reverse geocoding, for e.g. in nominatim project for lat/lon of not existing postal code in https://nominatim.openstreetmap.org/reverse.php?format=html I have postcode in 100% of cases. If I understand correctly, nominatim use openstreetmap data, and Pelias use openstreetmap data, maybe pelias can execute reverse geocoding query to openstreetmap data if we can't find postcode?

artemChernitsov commented 4 years ago

@missinglink btw, in my case, postal code for this address 1865 John Towers Ave exists in http://www.sangis.org/ https://prnt.sc/t0svs6.

artemChernitsov commented 4 years ago

@missinglink also, in the wof database for this place have postalcode https://spelunker.whosonfirst.org/id/169719625/

orangejulius commented 4 years ago

Hi @artemChernitsov,

Something strange may be happening with that data from San Diego county. Usually a source from OpenAddresses will either have postal codes for all the addresss, or none of them.

I'm looking into that data now and will report back with any findings.

Unfortunately if the postal code is not in the data there is not really anything we can do. Even if the postal code is in WOF, it is not accurate to assign a postal code to an address based on a ZCTA polygon (the ultimate source of data for US postalcodes in WOF).

See https://github.com/iandees/wtf-zipcodes for a good explanation of that point, and https://github.com/pelias/pelias/issues/111 for extensive discussion around it for Pelias.

orangejulius commented 4 years ago

Okay, I took a look at the latest copy of the San Diego county data I could find, and unfortunately many of the records (75k out of the 1.1 million records in the file I looked at) do not have zip codes.

It looks like the San Diego county runs for OpenAddresses are currently failing, but they will probably succeed at some point in the near future since the upstream data source from the county appears to be good. My guess is eventually these zip codes will make their way in.

There's probably nothing more we can do on our side, it might make sense to ping the OA team in a few weeks/months if the data doesn't update.

I'm going to mark this issue closed, let me know if you have any other questions.

missinglink commented 4 years ago

For reference here's a blog post about how it's done in Nominatim: https://www.openstreetmap.org/user/lonvia/diary/43143

They are attempting to generate a 'best guess' based on a single point which is assigned to the postcode and the proximity of the address from that point.

This method can appear to be great but the problem is that there will always be postcodes assigned to addresses which are incorrect.

Pelias goes the other direction and only assigns postcodes when they have been explicitly provided.

We've discussed introducing a new field postcode_computed or postcode_quality to indicate the general quality of the postcode per-document but that feature is not currently being worked on by anyone AFAIK.

Hope that helps give some background.

artemChernitsov commented 4 years ago

Thank you very much for your answers.