I've added two examples to the fuzz.token_set_ratio docs: One showing if one string is a subset of the other, it will return 100.0 and the other showing a divergence from 100.0 when there is disagreement in the strings.
When I looked at the existing example:
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100.0
It wasn't clear to me if the score of 100.0 was only due to the sets of the two tokenized strings being equivalent (with the repetition of "fuzzy" being eliminated in the set) or because one set was a subset of the other, so I added these examples to make this more explicit.
I've added two examples to the
fuzz.token_set_ratio
docs: One showing if one string is a subset of the other, it will return 100.0 and the other showing a divergence from 100.0 when there is disagreement in the strings.When I looked at the existing example:
It wasn't clear to me if the score of 100.0 was only due to the sets of the two tokenized strings being equivalent (with the repetition of "fuzzy" being eliminated in the set) or because one set was a subset of the other, so I added these examples to make this more explicit.
Happy to make any adjustments if need be!