Closed orangejulius closed 4 years ago
While we definitely want to merge this PR, also be sure to read the discussion over at https://github.com/pelias/whosonfirst/pull/511 before doing so. I'd like to do some testing before rolling it out everywhere as well.
I realized this could help slightly mitigate the problems from https://github.com/pelias/openstreetmap/issues/507 with OSM venues, so I'm just going to merge it and roll it out everywhere. Hopefully we see some improvements!
https://github.com/pelias/model/pull/118 added support for removing duplicate values from the name field. This logic was not also applied to the
phrase
field.Duplicate values do not affect whether or not a particular document will match for a given query, but they do affect the scoring.
In some cases, the scoring boost for having tokens match twice from duplicates will over-rank a particular result. In other cases, the scoring penalty for having longer fields will under-rank a particular result.
To make sure our scoring is as fair as possible (pending other issues such as https://github.com/pelias/openstreetmap/issues/507), we should apply our current deduplication on both the
name
andphrase
fields.