kiyoka / fuzzy-string-match

fuzzy string matching library for ruby
Apache License 2.0
285 stars 39 forks source link

wrong spec #11

Closed tonytonyjan closed 10 years ago

tonytonyjan commented 10 years ago

In my submission, this line is wrong. It should be 0.9667 instead of 0.9722.

kiyoka commented 10 years ago

Why do you think the number 0.9667 is a correct answer instead of 0.9722 ? I tested this parameter with perl version of JaroWinkler. [https://github.com/naoya/perl-text-jarowinkler]

    #!/usr/bin/env perl
    use Text::JaroWinkler qw/distance/;

    my $d = distance( "henka",     "henkan" );
    printf( "result = %f\n", $d );

result = 0.972222

tonytonyjan commented 10 years ago

I tested this using the original C implementation by the author of the algorithm. ref: http://web.archive.org/web/20100227020019/http://www.census.gov/geo/msb/stand/strcmp.c

tonytonyjan commented 10 years ago

Below is the table I built before, FYI.

kiyoka commented 10 years ago

I ported Apache lucene to my fuzzystringmatch. so, I tested same parameter with Apache Lucene version 2.9.4. ref: https://github.com/kiyoka/fuzzy-string-match/blob/master/original/LuceneSample/src/LuceneSample.java

result is:

str1[henka] str2[henkan] d=0.972222
str1[al] str2[al] d=1.000000
str1[martha] str2[marhta] d=0.961111
str1[jones] str2[johnson] d=0.832381
str1[abcvwxyz] str2[cabvwxyz] d=0.958333
str1[dwayne] str2[duane] d=0.840000
str1[dixon] str2[dicksonx] d=0.813333
str1[fvie] str2[ten] d=0.000000

I don't known the difference of original C code and Apache lucene...

tonytonyjan commented 10 years ago

interesting... I haven't read the source code of lucene yet.

kiyoka commented 10 years ago

Thank you for your information.

This 'fuzzy-string-match' project is a porting project of Apache Lucene. 'fuzzy-string-match' gives same result with Apache Lucene. So, I decided to close this issue.