thinkingmachines / linksight-2018

LinkSight is a web app for applying the Philippine Standard Geographic Code to messy and misspelled barangay, municipality, city, and province names.
https://linksight.thinkingmachin.es
GNU General Public License v3.0
11 stars 1 forks source link

improve handling of metro manila, canonical names in reference file #249

Closed piafaustino closed 6 years ago

piafaustino commented 6 years ago

The canonical location names in the reference file weren't using standard formats. For example, one canonical name would use numbers "Barangay 1, Tondo, Manila, Metro Manila" but another might use roman numerals and an alias for one component "Barangay II, Tondo, Manila, NCR."

I've updated the reference file to use as canonical names the original terms found in the PSGC file from the Philippine Statistics Authority website.

I've also fixed the way Metro Manila locations are organized. We used to include the districts in the candidate terms. So for example, before, this item will appear in our reference file:

"Fort Bonifacio, Taguig, N District, Metro Manila".

Because of this, if your search terms only include "Fort Bonifacio, Taguig", you wouldn't get an exact match on this. However, few people use districts when referring to places in Metro Manila and technically Metro Manila isn't a province.

We've updated the reference file so that even if your search terms only have Barangay and Municipality/City for any location in Metro Manila, you can still get an exact match.