Closed clar-reese closed 5 years ago
Echoing this. This is what I meant before when I said we need to deduplicate first the unique locations provided by the client, and _then__ match those before merging them back to the original dataset. You don't need to run matching on every row if several rows contain the same location.
A quick fix for improving performance. Let's keep this as a to-do for this Release 0.2
Stopping iteration on original matcher since we're going with the latest Java-based algo by Iman
Example: A lot of the rows in the SuySing file had a city of "Quezon City" and the province of "Metro Manila"; the first choice given to me (chosen by default) was for the City of Manila, and the second was for Quezon City; I didn't really feel like selecting the second option for hundreds of rows
Suggestion: Group rows with the same city/municipality and province together, so that we only have to choose once per unique combination