openvenues / lieu

Dedupe/batch geocode addresses and venues around the world with libpostal
MIT License
82 stars 23 forks source link

Question about --name-only matching #10

Open jesseclark opened 6 years ago

jesseclark commented 6 years ago

Hello,

I just ran the command-line dedupe tool over a dataset passing the --name-only flag and got this in the results (Note that I am parsing the JSON and dumping them to screen for quick reviewing so the format is different):

Osceola High
1111 Oak Ridge Dr, Osceola, WI 54020

SAME
>>> Osceola Middle
>>> 1029 Oak Ridge Dr, Osceola, WI 54020
>>> 1.0
>>> Osceola Elementary
>>> 250 10th Ave E, Osceola, WI 54020
>>> 1.0
>>> Osceola Intermediate
>>> 949 Education Ave, Osceola, WI 54020
>>> 0.9999840824

I wonder if you could help me understand why these four names which seem substantially different at a glance, get such high similarity scores from the deduper?

Thanks!