seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.2k stars 878 forks source link

_set_token_ratio now keeps tokenization. #300

Open MWLever opened 3 years ago

MWLever commented 3 years ago

Previously, _settoken ratio, through a mixture of join and split and strip concatenated all tokens together with no whitespace. This allowed for partial matches across token boundaries. This can occur in practice when a human enters a search, but is rare.

Change: Implement Levenshtein's setratio() scoring and preserve tokenization in fuzz._set_token_ratio

Now fails 2 tests due to score changes, which should be expected.

testTokenSetRatio: score improves testWithCutOff: score improves to above 50