seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.2k stars 878 forks source link

`process.dedupe()` gives IndexError: list index out of range because of bug in `process.extractWithoutOrder()` #307

Open Thijsvandepoll opened 3 years ago

Thijsvandepoll commented 3 years ago

Hi all,

I found a bug in process.extractWithoutOrder() which causes process.dedupe() to fail unexpectedly. The example:

process.dedupe(["BRITT JEFFREY S", "BRITT JEFFREY S.", "WIEDEMAN SCOTT", "WIEDERMANN SCOTT", "斯科特·维德曼", "杰弗里·S·布里特"])

which results in:

IndexError: list index out of range

The expected result here is:

dict_keys(['BRITT JEFFREY S.', 'WIEDERMANN SCOTT', '斯科特·维德曼', '杰弗里·S·布里特'])

I looked into the source code and I believe I found a bug in process.extractWithoutOrder() which sets the used (pre)processor different for the query then for the choices. I will create a merge request to fix this issue.