thinkingmachines / linksight-2018

LinkSight is a web app for applying the Philippine Standard Geographic Code to messy and misspelled barangay, municipality, city, and province names.
https://linksight.thinkingmachin.es
GNU General Public License v3.0
11 stars 1 forks source link

Add test for exact matches #165

Closed ghost closed 6 years ago

ghost commented 6 years ago

Need feedback on why the test is failing.

marksteve commented 6 years ago

@syk0saje Will look into this tomorrow!

stuckoverflo commented 6 years ago

@syk0saje, you will get more than 1 row for the results. The result of the matcher will contain each interlevel match as a separate row. So in this case, if it has a match or multiple matches for all columns, it will return three rows: one for barangay, municity, province.

For your test dataset, since those are all pretty close and we will get matches for all fields, we'll get four of these:

            code  score      location interlevel province_code city_municipality_code
index
0      012800000    100  ILOCOS NORTE       Prov     012800000              012800000
0      012801000    100         ADAMS        Mun     012800000              012801000
0      012801001    100  ADAMS (POB.)        Bgy     012800000              012801000

After 0.2, I'm planning on refactoring the code to de-couple the matching-specific algo from the actual app so it'll be easier for us to test other matching algorithms that we can think of

ghost commented 6 years ago

I see. This does not fulfill the following spec though: "If it's an exact string match, do not include near matches". How do we fix this so that it only returns the 3 exact matches?

stuckoverflo commented 6 years ago

There are four rows in the test dataset so it returns matches for all of those rows (3 rows per each row of the test dataset since we were able to match them all with no multiple matches.). In this case, if we just have the first row in the test dataset, we'll get just the three rows above.

ghost commented 6 years ago

Ahh, got it. Thanks. Will revise and make a new pull request.