seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.2k stars 878 forks source link

process.extract broken in fuzzywuzzy=0.13 #314

Open spirit1317 opened 3 years ago

spirit1317 commented 3 years ago

From version 0.13 onward, theres a mismatch between process.extract(scorer=fuzz.ratio) scores and fuzz.ratio.

#fuzzywuzzy==0.12
from fuzzywuzzy import process, fuzz

process.extract('OdCeny', ['producent'], scorer=fuzz.ratio)
fuzz.ratio('producent', 'OdCeny')

prints:

[('producent', 40)]
40

But

#fuzzywuzzy==0.13
from fuzzywuzzy import process, fuzz

process.extract('OdCeny', ['producent'], scorer=fuzz.ratio)
fuzz.ratio('producent', 'OdCeny')
#

prints:

[('producent', 67)]
40

Please let me know if this is a feature or bug.

spirit1317 commented 3 years ago

Also, if you change the order it will give a different score:

process.extract('OdCeny', ['producent'], scorer=fuzz.ratio)
#[('OdCeny', 40)]

process.extract('producent', ['OdCeny'], scorer=fuzz.ratio)
#[('OdCeny', 67)]
Azrael1 commented 3 years ago

https://github.com/seatgeek/fuzzywuzzy/issues/288#issuecomment-720078842 @spirit1317 see this comment

maxbachmann commented 3 years ago

Use:

process.extract('producent', ['OdCeny'], scorer=fuzz.ratio, processor=None)
#[('OdCeny', 40)]

to get the same results