Wired behavior of partial_ratio

seatgeek / fuzzywuzzy

Fuzzy String Matching in Python

GNU General Public License v2.0

9.23k stars 876 forks source link

In [47]: fuzzywuzzy.fuzz.partial_ratio("red", "random") Out[47]: 33 In [48]: fuzzywuzzy.fuzz.partial_ratio("rod", "random") Out[48]: 33 In [49]: fuzzywuzzy.fuzz.partial_ratio("prod", "random") Out[49]: 25 In [50]: fuzzywuzzy.fuzz.partial_ratio("pred", "random") Out[50]: 50

This is a known issue in python-Levenshtein: https://github.com/seatgeek/fuzzywuzzy/issues/79 In your case for the comparision of

"prod" <-> "random"

the following alignment is used:

"prod" <-> "ndom"

which has a similarity of 25. However the optimal alignment would be:

"prod" <-> "rand"

which has a similarity of 50. In FuzzyWuzzy you will get the correct result when the slower difflib based implementation is used:

>>> from fuzzywuzzy import fuzz
>>> from difflib import SequenceMatcher
>>> fuzz.SequenceMatcher = SequenceMatcher
>>> fuzzywuzzy.fuzz.partial_ratio("prod", "random") 
50

seatgeek / fuzzywuzzy

Wired behavior of partial_ratio #313