Closed Pathcharnee closed 1 year ago
fuzz.ratio
is based on the normalized Indel similarity. The Indel distance only allows insertions and deletions. So it behaves like the Levenshtein distance with substitutions weighted as 2. For your example:
>>> from rapidfuzz.distance import Indel
>>> from rapidfuzz import fuzz
# only one insertion of !
>>> Indel.distance("this is a test", "this is a test!")
1
# maximum - distance with maximum = len(s1) + len(s2) = 29
>>> Indel.normalized_distance("this is a test", "this is a test!")
0.034482758620689655
# 1.0 - normalized_distance
>>> Indel.normalized_similarity("this is a test", "this is a test!")
0.9655172413793104
# normalized_similarity * 100
>>> fuzz.ratio("this is a test", "this is a test!")
96.55172413793103
I've seen the example as below but I don't understand how it's come. Can anyone help to demonstrate the similarity score?