larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
613 stars 194 forks source link

This fixes Classical and Weighted Levenshtein distances. #269

Open ibuda opened 5 years ago

ibuda commented 5 years ago

I spotted a mistake in the implementation of the Levenshtein.distance and WeightLevenshtein.distance methods. The errors described in #268, #239 and #244 comes from using the wrong indexing in the "matrix" array. Also, the value returned in both methods does not return the correct cell, i.e. last value in the "matrix" 1-dimensional array (by analogy the bottom right cell of the two-dimensional memoization matrix).

By the way, thank you for the tremendous amount of work input into Duke.