Closed rocke2020 closed 1 month ago
https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html#rapidfuzz.fuzz.partial_ratio I did read this doc, but still don't know why 57.14 is got
print(fuzz.partial_ratio('34cdef16z', '09cdef78'))
fuzz.partial_ratio
uses a sliding window of the short string on the longer string. For each window it calculates the fuzz.ratio
and returns the alignment with the highest similarity. These substrings/windows in the longer string can never be longer than the shorter string. However they may be shorter if they are placed at the start/end of the longer string.
fuzz.partial_ratio_alignment
returns the used alignment which helps in understanding the score.
In your example this returns:
>>> fuzz.partial_ratio_alignment(a, c)
ScoreAlignment(score=57.14285714285714, src_start=0, src_end=6, dest_start=0, dest_end=8)
So the used alignment is:
>>> fuzz.ratio(a[0:6], c)
57.14285714285714
rapidfuzz 3.9.6, python 3.10 a = '34cdef16z' c = '09cdef78' the intersection of 2 string is "cdef". I think the parital ratio logic may be the length of "cdef" divide the shorted length of inputed sequence pairs, that's partial ratio 0.5. Now, it is 57.14 could you explian how and why 57.14 is calcuated? thanks!! I know Jaccard similarity ratio. But I need a partial edit distance and so prefer "partial ratio" by rapidfuzz
from icecream import ic from rapidfuzz import fuzz
a = '34cdef16z' c = '09cdef78' ic(fuzz.partial_ratio(a, c)) ic(fuzz.partial_token_sort_ratio(a, c)) ic(fuzz.partial_token_ratio(a, c)) ic(fuzz.partial_ratio_alignment(a, c))