rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics
https://rapidfuzz.github.io/RapidFuzz/
MIT License
2.61k stars 116 forks source link

Adds examples to token_set_ratio #377

Closed jlb52 closed 3 months ago

jlb52 commented 4 months ago

I've added two examples to the fuzz.token_set_ratio docs: One showing if one string is a subset of the other, it will return 100.0 and the other showing a divergence from 100.0 when there is disagreement in the strings.

When I looked at the existing example:

>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")    
100.0

It wasn't clear to me if the score of 100.0 was only due to the sets of the two tokenized strings being equivalent (with the repetition of "fuzzy" being eliminated in the set) or because one set was a subset of the other, so I added these examples to make this more explicit.

Happy to make any adjustments if need be!

maxbachmann commented 3 months ago

Thanks for the contribution :+1: