Deduplicate locations provided by the user before matching them (on Matcher Version Flo)

thinkingmachines / linksight-2018

LinkSight is a web app for applying the Philippine Standard Geographic Code to messy and misspelled barangay, municipality, city, and province names.

https://linksight.thinkingmachin.es

GNU General Public License v3.0

11 stars 1 forks source link

Deduplicate locations provided by the user before matching them (on Matcher Version Flo) #183

Closed clar-reese closed 5 years ago

clar-reese commented 6 years ago

Example: A lot of the rows in the SuySing file had a city of "Quezon City" and the province of "Metro Manila"; the first choice given to me (chosen by default) was for the City of Manila, and the second was for Quezon City; I didn't really feel like selecting the second option for hundreds of rows

Suggestion: Group rows with the same city/municipality and province together, so that we only have to choose once per unique combination

piafaustino commented 6 years ago

Echoing this. This is what I meant before when I said we need to deduplicate first the unique locations provided by the client, and _then__ match those before merging them back to the original dataset. You don't need to run matching on every row if several rows contain the same location.

piafaustino commented 6 years ago

A quick fix for improving performance. Let's keep this as a to-do for this Release 0.2

piafaustino commented 5 years ago

Stopping iteration on original matcher since we're going with the latest Java-based algo by Iman