komoot / photon

an open source geocoder for openstreetmap data
Apache License 2.0
1.9k stars 281 forks source link

Add special phrase searching like nominatim #557

Open JinIgarashi opened 3 years ago

JinIgarashi commented 3 years ago

I just found this article in nominatim website.

If you want to be able to search for places by their type through special key phrases you also need to enable these key phrases like this:

./utils/specialphrases.php --wiki-import > specialphrases.sql
psql -d nominatim -f specialphrases.sql

Note that this command downloads the phrases from the wiki link above. You need internet access for the step.

I think it might be better if we can search POI by key from Nominatim/Special Phrase. For instance, search bank type only at certain city.

I am not very sure whether it is possible technically. What do you think?

lonvia commented 3 years ago

It's technically possible but we are a long way off from a practical implementation.

JinIgarashi commented 3 years ago

It's technically possible but we are a long way off from a practical implementation.

Thanks for the comments. This is just my humble suggestion. It would be great if photon can consider particular phrases to prioritize searching.

kenseii commented 3 years ago

@lonvia we are thinking about implementing something similar to what is being discussed in this issue.

Basically when searching or ordering the results, we would like some osm_values e.g: ['aerodrome', 'station','stop'] to be boosted, scored higher so that they show up at the top of the results compared to e.g: ['hotel','sauna'] even if the osm importance of the later would be higher.

Any idea or recommendation on how to implement this or where to look?

Is it better to perform this when querying, scoring or returning the data?

Is it something worth submitting a PR to the photon repo?

Thank you

lonvia commented 3 years ago

@kenseii It sounds like you are looking rather for a static boost by OSM type, i.e. the boost would be independent from the actual query. This issue is more about searching by keywords. 'tokyo station' would boost train stations, 'tokyo hotel' would boost hotels.

kenseii commented 3 years ago

@lonvia In order to make the search results dynamically boosted depending on the osm_value's type of the query, i am thinking of using synonyms.

e.g: if i search for Tokyo station, i would like to boost the results that have "station" as the osm_value.

the reason why we think synonyms are important is because the search query might hold a useful key that is not an osm_value

e.g searching with Narita airport: Narita airport -> airport -> aerodrome, so we boost the result with aerodrome

e.g searching with Narita hotel: Narita hotel -> hotel -> hotel, so we boost the result with hotel

We think that boosting based on dynamic osm_values would lead to dynamic bias. Do you think this is a good approach?

lonvia commented 3 years ago

If you go down this road, you probably have to actually remove the word you used for a keyword from the query before matching against the document because the keyword and its synonyms might not show up in the name at all and you don't have a full match anymore. Or you have to add the OSM key/value as a keyword to the collector but that has its own disadvantages.

I'm currently in the process of experimenting with this stuff for a project, including experimenting with synonyms. We will see what comes out of this.

kenseii commented 3 years ago

Actually we are planning to use add the osm_key and osm_value to the collector inside a text field which is analyzed by a synonym analyzer.

By doing that we can boost based on whether a query matches that field. What are the disadvantages of adding the osm key/value to the collector?

I'm currently in the process of experimenting with this stuff for a project, including experimenting with synonyms. We will see what comes out of this.

Glad to hear that, is this going to be open source?

lonvia commented 3 years ago

What are the disadvantages of adding the osm key/value to the collector?

They are English words that will interfere with searching. But it can work if you add a a symbolic replacement instead.

Glad to hear that, is this going to be open source?

Yes.

kenseii commented 3 years ago

@lonvia thank you very much for the PR https://github.com/komoot/photon/pull/581, i saw that it doesn't support multi-words or spaces and wanted to ask if there is a reason to it.

I was wondering if a graph token filter would help on synonyms with space.