ztane / python-Levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
GNU General Public License v2.0
1.26k stars 155 forks source link

editops's result do not match the ratio's result #51

Open brealisty opened 4 years ago

brealisty commented 4 years ago

str1 = 'AB1010' str2 = '1010AB' ratio' result --> 0.6666, that means there are 4 steps(2 delete, 2 insert),(12-4)/12; editops' result --> [('replace', 0, 0), ('replace', 1, 1), ('replace', 4, 4), ('replace', 5, 5)], obviously, this anwser not match (12-4)/12, instead of (12-8)/12.

some differences in those two function about the edit distance?

BobLd commented 4 years ago

Not sure, but could be linked to an issue in the editops_from_cost_matrix function. Check a possible solution here: https://github.com/ztane/python-Levenshtein/issues/16#issuecomment-613626787

maxbachmann commented 3 years ago

Editops uses the normal uniform Levenshtein distance, while ratio uses the InDel Distance (no Substitutions). In this specific implementation this is achived by giving substitutions a weight of 2 which is similar to a Insertion + a Deletion.