rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics
https://rapidfuzz.github.io/RapidFuzz/
MIT License
2.61k stars 116 forks source link

partial_ratio_alignment giving wrong index values #309

Closed sroy-forr closed 1 year ago

sroy-forr commented 1 year ago

partial_ratio_alignment giving wrong index values for source when the destination match is longer than the source match.

s1 = "a certain string"; s2 = "certainly"; res = fuzz.partial_ratio_alignment(s1, s2); print(res); Output: ScoreAlignment(score=77.77777777777779, src_start=0, src_end=9, dest_start=0, dest_end=9)

Notice the src is (0, 9) instead of (1, 8).

Is this an expected behaviour?

maxbachmann commented 1 year ago

fuzz.partial_ratio is using a sliding window of len(shorter_string) to determine the optimal alignment. fuzz.partial_ratio_alignment returns the indexes of one alignment which results in the optimal score. In your example the optimal alignment found this way is: "a certain" <-> "certainly".