Referring to the description of token_set_ratio in the original blog post: if the SORTED_INTERSECTION is a strict subset of STRING2, the result ratio will be 100. E.g.,
fuzz.token_set_ratio("Deep Learning", "Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2")
yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the SORTED_INTERSECTION component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").
Looking at fuzz._token_set, we see that it returns
Referring to the description of
token_set_ratio
in the original blog post: if theSORTED_INTERSECTION
is a strict subset ofSTRING2
, the result ratio will be 100. E.g.,yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the
SORTED_INTERSECTION
component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").Looking at
fuzz._token_set
, we see that it returnsIt appears the assumption is that the string remainder will never be empty. Perhaps something like this is more appropriate: