osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.11k stars 713 forks source link

Spelling correction #759

Open boxed opened 7 years ago

boxed commented 7 years ago

I've noticed (and then read in the documentation) that there is no spelling correction/guessing in Nominatim. I'd like to make some research into this, but to make any informed decision on if I've made progress there would have to be some data set I could compare to.

I was wondering if you guys log the queries from osm.org? More specifically a log of queries that returned no results would be very interesting to look at. Could probably be a pretty short log of a few days to be really useful for some simple testing (for example: I have an idea of if a search returns 0 results and there's a consonant that is tripled, we could replace it with a double and try again. Seems simple enough and should fix some searches).

In general it would be a good idea to have this ticket as a placeholder to discuss this general feature set. I think it would be really nice to have and would improve OSM quite a bit.

freyfogle commented 7 years ago

Hi @boxed

I'm one of the people running the OpenCage geocoder, we use nominatim (and other geocoders), and have fairly large volume of queries per day. We could definitely supply you with this kind of data. Let me see about gathering it up.

That said, one of the biggest problems we see that leads to no results is that people are sending garbage as queries. Looking at the queries we can see they have a database of what they think are addresses, but is in reality badly crawled HTML or similar. They then blast that at us.

Regardless, great to see you wanting to get involved in the project

boxed commented 7 years ago

Well that's a bit discouraging! Or maybe it's an opportunity to make some really robust code to try to handle even that really bad data :P

Can you attach such a data dump here to this ticket? That would be great I think.. then someone else can also look at this type of thing if I don't get anywhere/give up or if life gets in the way.

lonvia commented 7 years ago

The best bet for spelling correction at the moment is likely using elastic-search based search frontend (like photon) or looking into libpostal. Or even better a combination of both, where addresses created by nominatim get normalized via libpostal during import into the elastic-search index.

whackatracker commented 7 months ago

Has there been any progress on implementing a Fuzzy Search in Nominatem since 2017? For example I am trying this search which returns results only if I remove the space between "Take" and "31":

No results: https://nominatim.openstreetmap.org/search?amenity=Take 31&format=json&addressdetails=1&city=City%20of%20New%20York&state=New%20York&country=United%20States

Has results: https://nominatim.openstreetmap.org/search?amenity=Take31&format=json&addressdetails=1&city=City%20of%20New%20York&state=New%20York&country=United%20States

mtmail commented 7 months ago

@whackatracker Updates about this feature request will be posted here. There are currently no updates.