larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
614 stars 194 forks source link

Levenshtein distances Bug #268

Open ibuda opened 5 years ago

ibuda commented 5 years ago

Found a bug in Levenshtein and WegihtedLevenshtein distances implementations. In more details, the following methods give wrong #results: Levenshtein.distance("abc", "a"): Expected: 2, Actual: 1 Levenshtein.distance("a", "abc"): Expected: 2, Actual: 1 WeightedLevenshtein.distance("a2c3e", "1b1d1", e): Expected: 8, Actual: WeightedLevenshtein.distance("a", "abc", e): Expected: 2, Actual: 1

This error is caused by a mistake in the implementation of Levenshtein distances. Having studied the list of open issues, this bug causes #239 and #244.