osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.2k stars 715 forks source link

Allow to skip - sign used as comnnector when specifying searched name #3578

Open matkoniecz opened 1 week ago

matkoniecz commented 1 week ago

What did you search for?

AudioDesign, Kraków

What result did you get?

none (before I made https://www.openstreetmap.org/changeset/158895309 )

What result did you expect?

When the result missing completely:

https://www.openstreetmap.org/node/5806046629

Further details

I added a bit pointless alt_name tag to this object.

Another live case is at https://nominatim.openstreetmap.org/ui/search.html?q=Automoto%2C+Kraków https://www.openstreetmap.org/way/156530559

And https://nominatim.openstreetmap.org/ui/search.html?q=CycloCentrum%2C+Krak%C3%B3w https://www.openstreetmap.org/way/180878005

at the same time many cases (looking at https://overpass-turbo.eu/s/1TPY would never be searched by such form)

Though maybe manually adding such alt_name is proper fix? Though seems to be a bit silly, that is like adding transliteration manually.

lonvia commented 2 days ago

This is an awfully difficult problem because it requires concatenating two words into one. There are many names with hyphens. Would we want to do it for all of them or just some specific ones (like shops)? It probably doesn't hurt to add a concatenated alternative for search but it would blow up the index further.

Also, does it really stop with hyphenated names? What about the bike shop "Silver Bike" a bit to the south?

matkoniecz commented 1 day ago

This is an awfully difficult problem

yeah, I am aware

Just making them findable in theory sounds fine, but you do this for this case, few other and then you need to measure index size in petabytes or something. Repeating search to search also for entries without - sounds fine, until you realize it doubles processing for minor benefits

though I have no other great idea to solve this and adding alt_name on massive scale seems even worse

maybe adding it to index is actually OK? But it is not like I benchmarked this.

Also, does it really stop with hyphenated names?

you could do more of such fixup

What about the bike shop "Silver Bike" a bit to the south?

but in my experience - not this kind. - is kind of between space and no space