thinkingmachines / linksight-2018

LinkSight is a web app for applying the Philippine Standard Geographic Code to messy and misspelled barangay, municipality, city, and province names.
https://linksight.thinkingmachin.es
GNU General Public License v3.0
11 stars 1 forks source link

Suggestion for improving speed: Use filtered/smaller reference files if Municipality/City or Province are selected by user as lowest granularity in source dataset #146

Closed piafaustino closed 6 years ago

piafaustino commented 6 years ago

There are 43k barangays, but only 1600+ cities and municipalities. If the smallest granularity selected by the user in their source file is "municity" and not "barangay", then we can use a much smaller reference file of 1600+ rows only instead of 43k. Hopefully, this should improve performance for those who don't need to match at the barangay level.

marksteve commented 6 years ago

We sort of do this now in the algorithm. Columns are only compared to their respective subsets of the reference file.