jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.07k stars 159 forks source link

Possible confusing section in the docs #173

Closed AntoineRondelet closed 1 year ago

AntoineRondelet commented 1 year ago

The current state of the docs states:

Jaro-Winkler Similarity

def jaro_winkler_similarity(s1: str, s2: str)

Compute the Jaro-Winkler distance between s1 and s2. Jaro-Winkler is a modification/improvement to Jaro distance, like Jaro it gives a floating point response in [0,1] where 0 represents two completely dissimilar strings and 1 represents identical strings. See the Jaro-Winkler distance article at Wikipedia for more details.

As explained on the linked wikipedia article, the distance is defined as follows distance = 1 − similarity. So, a similarity of 1 yields a distance of 0 (same strings), and a similarity of 0 yields a distance of 1 (completely different strings).

The interchangeable use of "similarity" and "distance" in the documentation is confusing. The function seems to compote the similarity (as indicated in the name), not the distance (as said in the line following the function definition: Compute the Jaro-Winkler distance between s1 and s2.).

If I didn't miss anything, the fix is rather simple: Compute the Jaro-Winkler distance between s1 and s2. -> Compute the Jaro-Winkler similarity between s1 and s2..