larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
613 stars 194 forks source link

bug fix #248

Closed dipplestix closed 6 years ago

dipplestix commented 6 years ago

This PR, done in collaboration with Yuefeng Zhang and Carolyn Phillips, is to fix two bugs in the implementation of the Jaro-Winkler similarity metric. One lead to the metric being order dependent JW(a, b) != JW(b, a) and the other lead to multiple counting some common characters.

larsga commented 6 years ago

Thank you! This is excellent work, and very welcome. I'm going to merge it now.

You write that there were two bugs, but you only add one test case. Do you have tests for the other bug? If so, that would be very welcome.