Open sillybun opened 3 years ago
This is a known issue in python-Levenshtein: https://github.com/seatgeek/fuzzywuzzy/issues/79 In your case for the comparision of
"prod" <-> "random"
the following alignment is used:
"prod" <-> "ndom"
which has a similarity of 25. However the optimal alignment would be:
"prod" <-> "rand"
which has a similarity of 50. In FuzzyWuzzy you will get the correct result when the slower difflib based implementation is used:
>>> from fuzzywuzzy import fuzz
>>> from difflib import SequenceMatcher
>>> fuzz.SequenceMatcher = SequenceMatcher
>>> fuzzywuzzy.fuzz.partial_ratio("prod", "random")
50
why "pred" is more similar to "random" than "prod"?