osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.12k stars 713 forks source link

Word tokens with cardinal abbreviations not being assigned to addresses #3325

Closed jgardynik closed 8 months ago

jgardynik commented 8 months ago

What did you search for?

195 N 500 W, Logan, UT 84321 https://nominatim.openstreetmap.org/ui/search.html?q=195+N+500+W%2C+Logan%2C+UT+84321

What result did you get?

Only road results, but no house

What result did you expect?

I expect it to find the house, the same as if I expand the cardinals. 195 North 500 West, Logan, UT 84321 https://nominatim.openstreetmap.org/ui/search.html?q=195+North+500+West%2C+Logan%2C+UT+84321

Further details

Looking at the debug, I can see that there is a token available in the word table that have the abbreviations for the cardinals expanded to "north 500 west" (including "n 500 west", "n 500 w", and "north 500 w"), which it uses to run the search. However, that token is, for whatever reason, not being assigned to the name_vector field's array in the search_name table for this entry. If I add that token to the name_vector array for the proper tuple in the search_name table, it finds the result exactly as if I typed out the cardinals. This seems to be a very common problem with addresses in Utah, since the whole state uses a grid system. I suspect it's also a problem with many other addresses that have cardinals.

The reason I can see this being a huge issue is that the USPS will format any addresses with North/South/East/West in them down to their N/S/E/W abbreviations. So most people (at least in Utah, where I live) are more likely to type them in that way compared to writing them out.

I suspect there are other issues open that this addresses, but I decided to create a new one because I have a more accurate technical reason for why it isn't working. I just don't have enough information on the entire Nominatim system to determine exactly where the token isn't being associated, or I'd look into fixing it myself.

lonvia commented 8 months ago

The full name of 'North 500 West' only shows up in the 'name:full' tag which does not get abbreviation variants for the directions.

This touches the more general issue that there is no agreed on schema in the US on how to handle directional prefixes and suffixes.

Closing here in favour of the more generic #535. See also recently opened #3280.

jgardynik commented 8 months ago

I could make the argument that the United States Postal Service is the entity in the US most responsible for addressing standards, and they do have a section specifically on grid-style addresses containing double directionals here. They also maintain that abbreviating both pre- and postdirectionals is the preferred syntax here.

However, if Nominatim isn't willing to accept abbreviations on directionals, even when name:prefix is being used, what should we do with the data to ensure that we get accurate results? If we add short_name attributes to every road, e.g. "N 500 W" in my above example, will Nominatim index it properly then?

lonvia commented 8 months ago

The OSM community in the US needs to sit down and agree on how directionals are handled in tagging. The current situation is a mess and not easy to process. The Utah addressing proposal is a good start but needs some wider discussion to see if it is applicable for all of the US. I need to write up a summary of the current situation and the challenges to get the discussion started but that needs a bit of quiet time.

So the issue is not dead, just one of the bigger projects. #535 remains open.