I just ran the command-line dedupe tool over a dataset passing the --name-only flag and got this in the results (Note that I am parsing the JSON and dumping them to screen for quick reviewing so the format is different):
Osceola High
1111 Oak Ridge Dr, Osceola, WI 54020
SAME
>>> Osceola Middle
>>> 1029 Oak Ridge Dr, Osceola, WI 54020
>>> 1.0
>>> Osceola Elementary
>>> 250 10th Ave E, Osceola, WI 54020
>>> 1.0
>>> Osceola Intermediate
>>> 949 Education Ave, Osceola, WI 54020
>>> 0.9999840824
I wonder if you could help me understand why these four names which seem substantially different at a glance, get such high similarity scores from the deduper?
Hello,
I just ran the command-line dedupe tool over a dataset passing the --name-only flag and got this in the results (Note that I am parsing the JSON and dumping them to screen for quick reviewing so the format is different):
I wonder if you could help me understand why these four names which seem substantially different at a glance, get such high similarity scores from the deduper?
Thanks!