Open boxed opened 7 years ago
Hi @boxed
I'm one of the people running the OpenCage geocoder, we use nominatim (and other geocoders), and have fairly large volume of queries per day. We could definitely supply you with this kind of data. Let me see about gathering it up.
That said, one of the biggest problems we see that leads to no results is that people are sending garbage as queries. Looking at the queries we can see they have a database of what they think are addresses, but is in reality badly crawled HTML or similar. They then blast that at us.
Regardless, great to see you wanting to get involved in the project
Well that's a bit discouraging! Or maybe it's an opportunity to make some really robust code to try to handle even that really bad data :P
Can you attach such a data dump here to this ticket? That would be great I think.. then someone else can also look at this type of thing if I don't get anywhere/give up or if life gets in the way.
Has there been any progress on implementing a Fuzzy Search in Nominatem since 2017? For example I am trying this search which returns results only if I remove the space between "Take" and "31":
No results: https://nominatim.openstreetmap.org/search?amenity=Take 31&format=json&addressdetails=1&city=City%20of%20New%20York&state=New%20York&country=United%20States
@whackatracker Updates about this feature request will be posted here. There are currently no updates.
I've noticed (and then read in the documentation) that there is no spelling correction/guessing in Nominatim. I'd like to make some research into this, but to make any informed decision on if I've made progress there would have to be some data set I could compare to.
I was wondering if you guys log the queries from osm.org? More specifically a log of queries that returned no results would be very interesting to look at. Could probably be a pretty short log of a few days to be really useful for some simple testing (for example: I have an idea of if a search returns 0 results and there's a consonant that is tripled, we could replace it with a double and try again. Seems simple enough and should fix some searches).
In general it would be a good idea to have this ticket as a placeholder to discuss this general feature set. I think it would be really nice to have and would improve OSM quite a bit.