seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.2k stars 878 forks source link

Unexpected results from token_set_ratio() #290

Closed timdagit closed 3 years ago

timdagit commented 3 years ago

I've been playing with the library today and am a bit confused by the behaviour of token_set_ratio(). Regardless of the token manipulation I would only expect a result of 100 if both strings were identical, but I also get 100 from the example below:

from fuzzywuzzy import fuzz

result = fuzz.token_set_ratio("word1 word2 word3", "word1 word2")

I would have expected that from partial_token_set_ratio() but not here, unless I've missed something.

maxbachmann commented 3 years ago

It is 100 as well when all words of one of the two strings appear in the other string

timdagit commented 3 years ago

Ah, thanks, how does that differ from partial_token_set_ratio()?

maxbachmann commented 3 years ago

Yes partial_token_set_ratio is based on partial_ratio instead of ratio and is already 100 when one word is similar.

fuzz.token_set_ratio("word1 word2 word3", "word1 word4")
# 71
fuzz.partial_token_set_ratio("word1 word2 word3", "word1 word4")
# 100
timdagit commented 3 years ago

Thanks for clarifying