Closed vitalie-cracan closed 1 year ago
1 is pretty common among implementations, and the route we chose. It is by design.
If you have time, can you mention the reasons why this route was chosen? Thanks.
I'm not sure I recall the entire reasoning at the time, presumably to match an existing implementation or two. It's also nicer working with integers than floats :)
Someone a while ago suggested allowing setting custom weights, which I'd be glad to accept a PR for so long as it didn't break backwards compatibility.
On Thu, Jun 22, 2023, at 1:25 AM, vitalie-cracan wrote:
If you have time, can you mention the reasons why this route was chosen? Thanks.
— Reply to this email directly, view it on GitHub https://github.com/jamesturk/jellyfish/issues/190#issuecomment-1602075438, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAB6YS3XDGZR3Z7QVKXO2DXMPQMRANCNFSM6AAAAAAZO2KQAQ. You are receiving this because you modified the open/close state.Message ID: @.***>
The computation of half transpositions seems to be implemented differently from original paper (could not find a free Jaro paper, but here's one that is free from Winkler: https://www.researchgate.net/publication/245534659_Advanced_Methods_For_Record_Linkage. It is surely implemented differently in Java/Apache Commons: https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/similarity/JaroWinklerSimilarity.java#L163
The difference is that the halving is a float halving, not integer one. So 3 transpositions is equal to 1.5 half-transpositions, not 1.