pelias / api

HTTP API for Pelias Geocoder
http://pelias.io
MIT License
221 stars 162 forks source link

Duplicated results - WOF - DiffPlace.js #1071

Open jbgriesner opened 6 years ago

jbgriesner commented 6 years ago

Some searches in France (such as "Lognes", "Sucy en Brie" or "Boissy St Léger") seem to lead to duplicated results.

These queries return, among others, respectively:

So there is apparently a problem with wof duplicate data "locality" and "localadmin", and also with duplication checking (in "middleware/dedup.js").

To fix this it is apparently possible either to change the wof import in order to prevent "locality" and "localadmin" duplicates, or to add another test in "isDifferent()" function from "helper/diffPlaces.js".

What do you think ?

orangejulius commented 6 years ago

Hi @jbgriesner, Thanks for providing some very nice test cases. I believe we should solve this in the API deduplication middleware.

If I had to design it right now, I would say that it should operate by looking at multiple WOF records and if one is a locality, the other is a localadmin, their names are the same, and the localadmin is the parent of the locality, we should consider them duplicates

Which one to prefer is and interesting question. My intuition is it should default to the locality. If needed we could come up with something more complex.

missinglink commented 5 years ago

I'm currently in the process of refactoring the dedupe middleware in https://github.com/pelias/api/pull/1222

However, I suspect this issue will be improved by the work the WOF team is currently doing in https://github.com/whosonfirst-data/whosonfirst-data/pull/1343

Deduplicating between localadmin and locality layers is a UX question, in a lot of cases, these two concepts are different from a legal and administrative point-of-view but synonymous from a casual users perspective.

We would need to choose if we want to be technically correct or user-friendly :)

orangejulius commented 5 years ago

Here's another example of administrative area duplication:

/v1/autocomplete?boundary.country=aus&text=gungahlin, image

Basically we get a WOF neighbourhood, locality, and localadmin with the same name, plus a Geonames record of the same name. The Geonames record shows as a venue, but is probably an admin area that's incorrectly classified by our importer

orangejulius commented 5 years ago

All of these examples have now been fixed after https://github.com/pelias/api/pull/1230, except for http://pelias.github.io/compare/#/v1/search%3Ftext=Boissy%20St%20L%C3%A9ger which appears to be failing because of differing diacriticals. We can probably both fix that in WOF data and add code to ignore diacriticals when deduping.

bboure commented 4 years ago

Brussels also has several duplicates: https://pelias.github.io/compare/#/v1/autocomplete?layers=locality&text=Bruss&debug=0

Some are in fact part of other localadmins like Dilbeek

It seems like a WOF issue though?