seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.2k stars 878 forks source link

Add "Results could differ" in warning message when using slow pure-python SequenceMatcher #298

Open ivsanro1 opened 3 years ago

ivsanro1 commented 3 years ago

We just found that fuzz.WRatio() gives different results depending if python-Levenshtein is installed or not.

Given the warning message when importing fuzzywuzzy.fuzz and python-Levenshtein is not installed:

/usr/local/lib/python3.6/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

The user will think that the difference is purely in terms of speed, which is not.

Versions used:

fuzzywuzzy==0.18.0
python-Levenshtein==0.12.0

Example of score differences :

>>> from fuzzywuzzy import fuzz
/usr/local/lib/python3.6/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
>>> fuzz.WRatio('Copia', 'electronica')
50
>>> from fuzzywuzzy import fuzz
>>> fuzz.WRatio('Copia', 'electronica')
54

We strongly suggest to specify in the warning message that results could differ between the "pure-python SequenceMatcher" and the python-Levenshtein version:

warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning. Results can be different between SequenceMatchers')