xdrop / fuzzywuzzy

Java fuzzy string matching implementation of the well known Python's fuzzywuzzy algorithm. Fuzzy search for Java
GNU General Public License v2.0
822 stars 118 forks source link

Incorrect levenshtein distance for completely edited strings #76

Closed dstibbe closed 5 years ago

dstibbe commented 5 years ago

When I calculate the ratio of "abcdef" - "fedcba" , it results in 17, even though I expected 0.

The ratio calculation is as I understand it: r = ( 1 - d/L)*100 , with d being the Levenshtein distance and L the sum of the two compared strings.

In this library the levenshtein distance is valued with 1 for each insert/delete and 2 for each replace.

The levenshtein distance in this library, for these two strings should be 12 (2 for each replace), resulting in a ratio = (1 - 12/12)*100 = 0

However, in your library, the ratio results in 17, instead of 0. This is because the distance it calculates is 10 instead of 12, resulting in (1-10/12)*100=17 .

This seems to be the case for string of any length, whith 100% replacements, as if 1 replacement is missed.

dstibbe commented 5 years ago

Never mind, it naturally calculates the minimum distance:

a b c d e f
  f e d c b a
1 2 2 0 2 2 1  = 10 

It does calculate a distance of 12 in abcdef -> uvwxyz . .

Can close this 'issue'