pelias / geonames

Import pipeline for geonames in to Pelias
https://pelias.io
MIT License
45 stars 37 forks source link

Duplicate results caused by Geonames records with names like 'City of X' or 'Town of Y' #395

Closed orangejulius closed 2 years ago

orangejulius commented 3 years ago

This is a new issue to document an old problem: Geonames records often have prefixes on the names of cities, leading to duplicate results.

Examples

Screenshot_2021-03-12_14-25-58 https://pelias.github.io/compare/#/v1/autocomplete?text=philadelphia

image

image

Screenshot_2021-03-12_15-14-42 https://pelias.github.io/compare/#/v1/autocomplete%3Ftext=new%20york

Solutions

We've already opened two PRs to potentially solve this problem

https://github.com/pelias/geonames/pull/372 addresses this by modifying the names of Geonames records, and removing 'City of' and 'Town of' style prefixes, solving the problem at index time

https://github.com/pelias/api/pull/1371 on the other hand, improves API deduplication to handle these cases at query time

There are advantages and tradeoffs to both approaches. As @missinglink mentioned in https://github.com/pelias/geonames/pull/372#issuecomment-538939148, a good policy is to avoid making major modifications to data from our upstream datasets where possible. This means query-time deduplication is the preferred solution.