osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.12k stars 715 forks source link

Proper support for postal towns in the US #1120

Open JonathanMontane opened 6 years ago

JonathanMontane commented 6 years ago

Bug

According to https://wiki.openstreetmap.org/wiki/Nominatim/Development_overview, place:village and place:hamlet should be rank 18, however at least some place:village and place:hamlet are not rank 18, but rank 16.

Examples:

Why this matters

This is probably causing issues with some unincorporated places in the US.

In the US, when a street is outside of a city, the standard way of distinguishing it from another street with the same name is by using the name of the closest city, which seems to be what Nominatim is doing, based on my personal observations (I may be wrong, though).

However, since some hamlets and villages are considered as cities, this causes issues as the closest city can now be a hard-to-identify hamlet instead of a city.

This happens for at least two data points:

lonvia commented 6 years ago

The table you refer to describes the importance ranking of places (aka rank_search). For computing the address a slightly different schema is used and there city, town, village and hamlet are on the same level. It's a tricky business figuring out what the hierarchy of these kind of places is, as OSM data is very inconsistent here.

JonathanMontane commented 6 years ago

I definitely understand that figuring out the hierarchy of these kind of places is a very difficult job.

However, this ranking decision is over 8 years old. It is quite likely that the data quality of OpenStreetMap greatly improved since 2010, and maybe it is now possible to do a better ranking.

As demonstrated in this issue, there are at least a few points for which this ranking is sub-optimal/incorrect. Do we know of any points where the opposite would be true? e.g. Do we know of any place where a hamlet/village having an address rank of 18 would impact negatively its address?

The current ranking was probably motivated by possible issues that were noticed at the time, and we could just go over them to see if they would be incorrect with the proposed ranking.

artur-w commented 6 years ago

Same problem here: https://nominatim.openstreetmap.org/details.php?place_id=31560053. Nominatim uses Wirki hamlet instead of Komorniki city where the place is located. Is there any way to force using valid place with same address rank?

lonvia commented 6 years ago

Downgrading hamlet/village would break it in lots of places because it means that now the nearest city/town and village/hamlet appear in the address. Also it wouldn't fix your examples.

The town of Naples is defined like this in OSM: https://www.openstreetmap.org/relation/1216778 . Timarron Way is clearly outside these boundaries. So ranks are not your problem. Boundaries are. I am aware of the unfortunate situation in the US that administrative boundaries of cities have little to do with the postal cities (i.e. the city/town you would put in the address). Unfortunately, I have no good solution for it. Drawing place=city/town/village/hamlet polygons around the approximate area might be one thing that works out. Or maybe the US needs a completely new solution. I don't know.

N8DNX commented 5 years ago

The quote below is from a comment I placed on another related thread (now closed). This illustrates the problem with matching addresses in the U.S.

_I believe the problem is that the US ZIP Code database has "Preferred" city names for many ZIP codes. Most people in the US use those preferred city names, so that's what they would enter to do a search. Nominatim will not return a result unless it matches the placename, but it's not considering the preferred name.

Example: "3962 Wilkinson Road, Gaylord, MI" does not match despite the use of the preferred name. It will match if "Gaylord" is replaced with the county name "Otsego County" or if the local community name of "Sparr" is used. Unfortunately, most people who live and work there have probably never heard of "Sparr", so they would never think to use that for the city name. If you don't believe me, look for the "Sparr, MI" in Google to see what's there._

For a better chance of getting a match for the time being, if we don't get a match on the first try, we try again without the city name. This often will return a match with the local hamlet or town, but that's not the city name normally used.

In the case above, it's highly unlikely that anyone would list an address with "Sparr" for the city, it would almost always be with "Gaylord", which is in fact the "Preferred" city ("P") in the U.S. ZIP code database even though the boundary of Gaylord" does not encompass Sparr.

There are other common situations in the U.S. where another city name would used, and it may be neither the actual local town/hamlet name nor the preferred name, but another name that is also listed for that ZIP code. For example, here in Petoskey, MI, the city annexed an area near the city when a large high-end development was being built. Often people will use the name of that development "Bay Harbor" because it "adds class" to their address. The U.S. mail would still correctly deliver to such an address and it would be common for a person to enter their address that way when doing things on-line.

I believe the correct solution would be to have Nominatim check the submitted city/town/hamlet name but also fall-back to accepting as a valid match any other location name associated with the submitted ZIP code as a possible match.

pilyfond commented 5 years ago

@N8DNX https://www.openstreetmap.org/way/288005714 even has all the necessary information: tiger:zip_left=49735 -> this would be Otsego. Here we have an unheard voice from 2011. Probably TIGER is correct on these.

@lonvia I could create pseudo postal relations for the whole US. But maybe re-adding tiger:zip_left and _right values is more viable?

lonvia commented 5 years ago

The postcode alone is not sufficient to solve the issue. The name that goes with the postcode is needed as well and should be recorded in OSM. How to do that is something that should be discussed within the US OSM community. Try the talk-us mailinglist.

GeekNJ commented 6 months ago

I see this as a slightly larger match condition issue/enhancement. The address lookup is too literal and either needs to be more flexible or needs the option to be more flexible such as an additional option of &exact=0 . With exact=1 or not present, the request would work the same as it does today, doing an exact match on what is passed against the source data. With exact=0 , the request could first do an exact match attempt, but if not found, do a subsequent lookup with adjusted data such as removing the city name and removing known address info (eg Apt / Suite / Unit) which prevents an exact match.

Lets look at 2 examples. Example 1: https://nominatim.openstreetmap.org/ui/search.html?q=1425+woodvine+way%2C+Alpharetta+GA+30005 1425 Woodvine Way, Alpharetta GA 30005

In this example, though the address is correct, OSM considers the location to be outside of the city Alpharetta so it returns nothing. Doing the same lookup without Alpharetta being passed returns the proper geocoded match. https://nominatim.openstreetmap.org/ui/search.html?q=1425+woodvine+way%2CGA+30005 1425 Woodvine Way, GA 30005

Example 2: https://nominatim.openstreetmap.org/search?q=9850+Richmond+Ave.+Apt+9104+Houston+TX+77042&format=jsonv2&limit=1 9850 Richmond Ave. Apt 9104 Houston TX 77042

In this example, though the address is correct, the presence of Apt 9104 causes the lookup to not find an exact match. The same call without Apt 9104 returns a match and geocoded response: https://nominatim.openstreetmap.org/search?q=9850+Richmond+Ave.+Houston+TX+77042&format=jsonv2&limit=1 9850 Richmond Ave. Houston TX 77042

What I've coded on my end to try and accommodate this behavior is to never pass the city in as I always have the 5 digit zip/postal code, and to strip out some address suffixes such as Apt, Suite, Unit, etc that will always cause no match. What I haven't figured out is how to strip an apartment/unit number at the end of the street address which has no label such as 9850 Richmond Ave. 9104 Houston TX 77042 . There are cases where a number/value at the end is valid as 123 Rt 17 .

To make the geocoding option more usable and not require each client to figure out and code this, it should be considered for inclusion in the base web service.