jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.04k stars 157 forks source link

Python and C versions of jaro(winkler) give different results #124

Closed jpweytjens closed 4 years ago

jpweytjens commented 4 years ago

The python and C implementation of jaro_winkler give slightly different results. To the best of my knowledge, the C implementation is correct. This has been pointed out in an issue in the textdistance package. A fix is also provided by the maintainer of the textdistance package in this PR.

import jellyfish
from jellyfish import _jellyfish

l = "Sint-Pietersplein 6, 9000 Gent"
r = "Test 10, 1010 Brussel"

jellyfish.jaro_distance(l, r)
>>> 0.5182539682539683
_jellyfish.jaro_distance(l, r)
>>> 0.5043650793650793