Closed tonytonyjan closed 10 years ago
Why do you think the number 0.9667 is a correct answer instead of 0.9722 ? I tested this parameter with perl version of JaroWinkler. [https://github.com/naoya/perl-text-jarowinkler]
#!/usr/bin/env perl
use Text::JaroWinkler qw/distance/;
my $d = distance( "henka", "henkan" );
printf( "result = %f\n", $d );
result = 0.972222
I tested this using the original C implementation by the author of the algorithm. ref: http://web.archive.org/web/20100227020019/http://www.census.gov/geo/msb/stand/strcmp.c
Below is the table I built before, FYI.
I ported Apache lucene to my fuzzystringmatch. so, I tested same parameter with Apache Lucene version 2.9.4. ref: https://github.com/kiyoka/fuzzy-string-match/blob/master/original/LuceneSample/src/LuceneSample.java
result is:
str1[henka] str2[henkan] d=0.972222
str1[al] str2[al] d=1.000000
str1[martha] str2[marhta] d=0.961111
str1[jones] str2[johnson] d=0.832381
str1[abcvwxyz] str2[cabvwxyz] d=0.958333
str1[dwayne] str2[duane] d=0.840000
str1[dixon] str2[dicksonx] d=0.813333
str1[fvie] str2[ten] d=0.000000
I don't known the difference of original C code and Apache lucene...
interesting... I haven't read the source code of lucene yet.
Thank you for your information.
This 'fuzzy-string-match' project is a porting project of Apache Lucene. 'fuzzy-string-match' gives same result with Apache Lucene. So, I decided to close this issue.
In my submission, this line is wrong. It should be 0.9667 instead of 0.9722.