Open Joxit opened 5 years ago
@Joxit I haven't, I was waiting for some clarifications in the other issue and I started working on some other stuff now. So sure, go ahead and take it, thanks a ton for doing this! :) 🥂
It seems that in order to validate this issue, all the importers must support the multi-lang index.
At this time, only OSM supports it.
WOF will be supported with pelias/whosonfirst#446
Geonames needs alternateNamesV2
file to add multi-lang (we want that ?)
OpenAddresses and Polylines are unavailable
I think, the most important importer is WOF, the city/country search is the most common use case of the geocoder.
@Joxit FWIW there seems to be some level of support for that already - when looking for something, passing e.g. lang=en
or lang=ru
yields the same name
but the city name is translated.
https://pelias.github.io/compare/#/v1/autocomplete%3Flang=en&text=red%20square%20moscow
vs
https://pelias.github.io/compare/#/v1/autocomplete%3Flang=ru&text=red%20square%20moscow
(see label
)
I thought that data was based on WOF.
Yes, this is done by pelias/placeholder which is a middleware and translate ElasticSearch responses for the user (using WOF ids).
This issue is about ES requests (and not responses).
That means, when you use lang=ru
and search red square Москва, you will not found the correct venue (geonames:venue:6295575
).
The data is present in WOF, but not indexed in ES, only the default name and English variant are currently indexed. That's why I opened pelias/whosonfirst#446 :smile:
Gotcha now, thanks! About that, I'm thinking we should also return Кра́сная пло́щадь
if someone searches red square lang=ru
, would you agree? I'm thinking this should be easier to achieve - building on what you pointed out about the middleware. I can make a PR if so.
I think the API can return the name.{lang}
index when it's available in OSM, but for Geonames, it will be a bit more tricky because we do not use it anywhere.
Maybe this can be added in placeholder ? But we will have conflicts with WOF data...
I was thinking about it at a higher level. Simplest seems to me to update geojsonify
here: https://github.com/pelias/api/blob/master/helper/geojsonify.js#L55-L60
Instead of going for default
, prioritize req.lang
?
Hi everyone! Any update on this one? LMK if I can help some way.
Hi @mihneadb, unfortunately, it's me whos the blocker here, I would like to land https://github.com/pelias/api/pull/1287 before merging this (It's a complex change but I'm planning on doing the final testing and merging next week).
It's really not ideal to hold back another PR, especially a community contribution, but it makes sense for us in this case because the PR I linked is a massive refactoring of how autocomplete queries are generated.
We are sometimes a little over-cautious with merging big PRs but it's our responsibility to ensure compatibility and reliability for organisations running Pelias in a production environment with user-facing traffic.
Oh actually I thought this was another PR, but the same still applies to this one ;)
@missinglink thanks for the transparency! Looking forward to using the new parser! :)
@missinglink Any news on this?
I've been sick this week but releasing the new parser is a top priority.
Hi,
I came across an issue related to this today. I was looking for Edo Tokyo Museum and could not find any result. I realized that I had to search for 江戸東京博物館 in order to find it.
Any ETA for this feature?
Hi @bboure, this part of the feature is already live if you are running your query with lang=en
. I found a difference in ES query between the English version and the Kanji version.
{
"constant_score": {
"filter": {
"multi_match": {
"type": "cross_fields",
"query": "Museum",
"fields": [
"parent.country.ngram^1",
"parent.dependency.ngram^1",
"parent.macroregion.ngram^1",
"parent.region.ngram^1",
"parent.county.ngram^1",
"parent.localadmin.ngram^1",
"parent.locality.ngram^1",
"parent.borough.ngram^1",
"parent.neighbourhood.ngram^1",
"parent.locality_a.ngram^1",
"parent.region_a.ngram^1",
"parent.country_a.ngram^1",
"name.default^1.5"
],
"analyzer": "peliasQuery"
}
}
}
}
In the must
clause, name.en^1.5
is missing.
The missing feature is multi lang in parent hierarchy now.
@Joxit Thanks for reaching back.
Add lang=en
does not work either though. The query does not include name.en^1.5
Am I doing something wrong?
Interestingly, looking for Edo Tokyo Museum, Tokyo
works
It has to do on how the query is built
Edo Tokyo Museum, Tokyo:
"must": [
{
"multi_match": {
"type": "phrase",
"query": "edo Tokyo Museum",
"fields": [
"phrase.default",
"phrase.en"
],
"analyzer": "peliasQuery",
"boost": 1,
"slop": 3
}
},
{
"multi_match": {
"type": "cross_fields",
"query": "Tokyo",
"fields": [
"parent.country.ngram^1",
"parent.dependency.ngram^1",
"parent.macroregion.ngram^1",
"parent.region.ngram^1",
"parent.county.ngram^1",
"parent.localadmin.ngram^1",
"parent.locality.ngram^1",
"parent.borough.ngram^1",
"parent.neighbourhood.ngram^1",
"parent.locality_a.ngram^1",
"parent.region_a.ngram^1",
"parent.country_a.ngram^1",
"name.default^1.5"
],
"analyzer": "peliasAdmin"
}
}
],
The full text falls into the peliasQuery
analyzer here, and Tokyo
into peliasAdmin
Edo Tokyo Museum:
"must": [
{
"multi_match": {
"type": "phrase",
"query": "edo Tokyo",
"fields": [
"phrase.default",
"phrase.en"
],
"analyzer": "peliasQuery",
"boost": 1,
"slop": 3
}
},
{
"constant_score": {
"filter": {
"multi_match": {
"type": "cross_fields",
"query": "Museum",
"fields": [
"parent.country.ngram^1",
"parent.dependency.ngram^1",
"parent.macroregion.ngram^1",
"parent.region.ngram^1",
"parent.county.ngram^1",
"parent.localadmin.ngram^1",
"parent.locality.ngram^1",
"parent.borough.ngram^1",
"parent.neighbourhood.ngram^1",
"parent.locality_a.ngram^1",
"parent.region_a.ngram^1",
"parent.country_a.ngram^1",
"name.default^1.5"
],
"analyzer": "peliasQuery"
}
}
}
],
Museum
here is separated into a second rule and missing name.en^1.5
Don't worry, I'm working on a fix, I will publish something tonight or tomorrow.
Yes, in autocomplete, the last token can be either a part of the subject (the venue) or the hierarchy. That's why we are using a cross_fields
with both parent.* and name.default.
Great, thanks!
What is this for ?
We want Pelias to send responses to queries written in other languages than English. For example, a Dutch looking for
Parijs
(Paris in Dutch) will getParijs, Frankrijk
.What should we do ?
multi_match
in autocomplete queries (done in #1300)name.$LANG
index with higher boost~name.default
index with standard boost~name.en
index as fallback with lower boost (when$LANG
is noten
)~name.$LANG
when available, default otherwise (done in #1301)Parijs, Frankrijk
.Some use cases
cc @mihneadb Have you started working on it ? I can take the task if you want :smile: