utils.full_process executed when processor=None

seatgeek / fuzzywuzzy

Fuzzy String Matching in Python

GNU General Public License v2.0

9.2k stars 878 forks source link

Great and very helpful tool! Thank you!

One thing I noticed is that even when process.extractOne (and others) have processor set to None, utils.full_process is still executed several times. Probably because of

https://github.com/seatgeek/fuzzywuzzy/blob/88951621d081095359f37fbf6f282f6e54336a14/fuzzywuzzy/process.py#L100

This generates two times the same output:

from fuzzywuzzy import process

query = "123   ....  "
choices = ["123", query]

print(process.extract(query, choices))
print(process.extract(query, choices, processor=None))

Output:

[('123', 100), ('123   ....  ', 100)]
[('123', 100), ('123   ....  ', 100)]

Expected would be that without a processor the 1:1 match is better. So some thing like this:

[('123', 100), ('123   ....  ', 100)]
[('123   ....  ', 100), ('123', 90)]

seatgeek / fuzzywuzzy

utils.full_process executed when processor=None #319