tonytonyjan / jaro_winkler

Ruby & C implementation of Jaro-Winkler distance algorithm which supports UTF-8 string.
MIT License
192 stars 29 forks source link

The calculation result of transposition is different from the original implementation #9

Closed yuki24 closed 9 years ago

yuki24 commented 9 years ago

The calculation result of t (half number of transposition) is different from the original implementation. Thus, the distance between a certain pair of 2 strings is incorrect.

strings this gem original
necessary and nessecary 3 2
does_exist and doesnt_exist 0 3
12345678 and 12345687 1 1
12345678 and 12345867 1 1
12345678 and 12348567 1 2

I'm not sure which one is mathematically better as a string metric, but at least the algorithm should be consistent.

tonytonyjan commented 9 years ago

@yuki24 I've rewrited the algorithm and it suppose to be correct in branch issues/9, however, the adjusting table is not implemented yet. :stuck_out_tongue:

Unfortunately, I am going to Okinawa tomorrow for traveling, I'll continue working on this after coming back(1 week), thanks reporting this issue :+1:

yuki24 commented 9 years ago

@tonytonyjan Thanks! Have fun in Okinawa :sunny: :ocean: :surfer:

tonytonyjan commented 8 years ago

@yuki24 I'm going to listen to your talk tomorrow, see you in Ruby Kaigi 2015 :)