pelias / pelias

Pelias is a modular open-source geocoder using Elasticsearch.
https://pelias.io
MIT License
3.2k stars 221 forks source link

French address search returns result from other countries #873

Open Joxit opened 4 years ago

Joxit commented 4 years ago

Describe the bug

I found strange behavior when searching for a French address.   The input is ZI des Clairances, 86320, Lussac-les-Châteaux, France

0) 5000 Les Clairances, Lussac-les-Châteaux, France
1) Zimbabwe
2) Guangxi Zhuang, China
3) Ziyang, China
4) Zibo, China
5) Zinder, Niger
6) Zigong, China
7) Xinjiang Uyghur, China
8) Zilina, Slovakia
9) Ningx

ZI means Zone Industrielle it's something like a place. The parsed text is correct (by pelias/parser :heart:)

{
  "subject": "ZI des Clairances",
  "place": "ZI des Clairances",
  "postcode": "86320",
  "locality": "Lussac-les-Châteaux",
  "country": "France",
  "admin": "Lussac-les-Châteaux, France"
}

The first result is correct, the POI is in the Zone Industrielle, the problem is all the other result. AFAIK, when a city exists, pelias uses Placeholder and search only in the city via the WOF id and will return the city as fallback. But here I get results from Zimbabwe and China :man_shrugging: I suspect the ZI in the subject...

Expected behavior

I think, in this example, only results from Lussac-les-Châteaux/France should be returned. And if there is no match, return the city (example ZI des unknown, 86320, Lussac-les-Châteaux, France should return Lussac-les-Châteaux instead of Zimbabwe)

0) Zimbabwe
1) Zibo, China
2) Zigong, China
3) Zibo, China
4) Guangxi Zhuang, China
5) Xinjiang Uyghur, China
6) Ningxia Hui, China
7) Zinder, Niger
8) Zilina, Slovakia
9) Ziyang, China

Additional context

When libpostal parses the text, the result is : (note the query)

{
  "query": "zi des clairances",
  "postalcode": "86320",
  "city": "lussac-les-châteaux",
  "country": "france"
}

In pelias/api/routes/v1.js#L108-L121, the predicate is blacklisting the query property, this implies that there is no ES search using Libpostal and Placeholder. Only the query using pelias parser is used.

When we use the pelias parser, placeholder is not used, that's why there is no restrictions to the city or fallback.

What is the best thing to do here ? Create new ES query using placeholder + pelias parser ? Why the query property from Libpostal is blacklisted ?

NickStallman commented 4 years ago

I've seen similar cases, where even though the country is in the query it gets totally ignored, along with other admin bounaries. Fortunately for my use cases I can usually specify boundary.country which locks the country in.and provides far more sensible results.

It would be good if there were clear high level admin boundaries mentioned that results in that area get a heavy score boost (or penalise results outside of them). Can't do it exact as I'm sure there's a lot of fuzziness around borders.