thinkingmachines / linksight-2018

LinkSight is a web app for applying the Philippine Standard Geographic Code to messy and misspelled barangay, municipality, city, and province names.
https://linksight.thinkingmachin.es
GNU General Public License v3.0
11 stars 1 forks source link

Enhancement/scoring logic #269

Closed piafaustino closed 6 years ago

piafaustino commented 6 years ago

Slight improvements to scoring matcher:

I had to restore doing fuzzy matching on all unique possible matches, rather than only running fuzzy matching on candidates with a minimum 3 common n-grams with the search terms. We were missing out on some correct matches because of this threshold.

Anyway, we implemented that threshold before to improve speed. But the speed is already faster now that we're using n-gram lengths of 3 and not 2. The trade off is that Linksight is not effective for matching misspelled location names of 3 or fewer characters. For example, it will not correctly identifying the barangay of "Aga, Delfin Albano" if this is misspelled as "Agm, Delfin Albano." But these cases are few and far between, so we can address later.

Other improvements: