tonytonyjan / jaro_winkler

Ruby & C implementation of Jaro-Winkler distance algorithm which supports UTF-8 string.
MIT License
193 stars 29 forks source link

Build error on Windows #1

Closed Arcovion closed 9 years ago

Arcovion commented 9 years ago
λ gem install jaro_winkler
Temporarily enhancing PATH to include DevKit...
Building native extensions.  This could take a while...
ERROR:  Error installing jaro_winkler:
        ERROR: Failed to build gem native extension.

    C:/Ruby200/bin/ruby.exe extconf.rb
creating Makefile

make "DESTDIR=" clean

make "DESTDIR="
generating jaro_winkler-i386-mingw32.def
compiling jaro_winkler.c
jaro_winkler.c: In function 'distance':
jaro_winkler.c:8:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
jaro_winkler.c:12:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
jaro_winkler.c:15:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
jaro_winkler.c:21:5: error: 'for' loop initial declarations are only allowed in C99 mode
jaro_winkler.c:21:5: note: use option -std=c99 or -std=gnu99 to compile your code
make: *** [jaro_winkler.o] Error 1

make failed, exit code 2

Gem files will remain installed in C:/Ruby200/lib/ruby/gems/2.0.0/gems/jaro_winkler-1.1.1 for inspection.
Results logged to C:/Ruby200/lib/ruby/gems/2.0.0/extensions/x86-mingw32/2.0.0/jaro_winkler-1.1.1/gem_make.out

Sheer coincidence that this gem was released just hours before I needed a Jaro-Winkler implementation! I had the same thought process about other gems (s1.split // -> s1.chars etc) and this looks excellent.

However it doesn't install for me, seems like you need to add a flag (-std=gnu99) for GCC.

tonytonyjan commented 9 years ago

Thanks for your feedback! It seems to be caused by for loop style. I guess replacing for(int i = 0; ... ) by int i; for(i = 0; ...) in distance.c should solve this issue. But I have to say sorry that this gem does not support Windows currently since I don't have a Windows environment right now :cry:. (It runs on Mac and Unix, and fallback to pure Ruby version in JRuby.)

I need some time to get a windows, and I also appreciate you can send a pull request :).

Arcovion commented 9 years ago

I added $CFLAGS << ' -std=gnu99' to the 2nd line of extconf.rb and the gem installed successfully. Requiring the gem shows this error though:

C:/Ruby200/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- jaro_winkler/jaro_winkler.so (LoadError)

The .so file is actually there, but isn't found. This might be my fault compiling the gem incorrectly, in any case $CFLAGS << ' -std=gnu99' is worth adding, probably with an if statement so it's only applied for Ruby on Windows.

I'll keep using AMatch for now, maybe you should add hotwater and amatch benchmark comparisons in future too. =]

Edit: This line shows you may have encountered that problem already, if you can apply that fix to Windows and add the C flag it should work just like JRuby. Didn't test that yet though.

tonytonyjan commented 9 years ago

I didn't knowhotwater and amatch before, thanks for noticing. I've added them to benchmark.

user system total real
jaro_winkler 0.370000 0.000000 0.370000 ( 0.370106)
fuzzystringmatch 0.140000 0.000000 0.140000 ( 0.147716)
hotwater 0.280000 0.000000 0.280000 ( 0.284125)
amatch 0.950000 0.000000 0.950000 ( 0.958045)

As the table shown, my work is slower than native version of fuzzaystringmatch and hotwater. After roughly reading their source code, it seems that they doesn't support UTF-8 string, in my submission, there is something to do with it because I spend more time to compute UTF-8 byte length. I'll try to enhance my project to be faster and still support UTF-8 string in the future.

Finally, I've made a patch to fallback to pure Ruby on windows, it supposed to solve this issue temporarily.

tonytonyjan commented 9 years ago

@Arcovion I wonder if you can help me trying this branch?

Arcovion commented 9 years ago

Tested that branch and all I had to change was if RUBY_PLATFORM['win']if Gem.win_platform? Seems to work well!

Rehearsal --------------------------------------------------
jaro_winkler_r  14.570000   0.000000  14.570000 ( 14.756844)
jaro_winkler_c   0.374000   0.000000   0.374000 (  0.376021)
---------------------------------------- total: 14.944000sec

                     user     system      total        real
jaro_winkler_r  14.446000   0.000000  14.446000 ( 14.510830)
jaro_winkler_c   0.374000   0.000000   0.374000 (  0.378022)
tonytonyjan commented 9 years ago

version 1.2.7 released. Thanks for your help! @Arcovion :beer:

Arcovion commented 9 years ago

Great, 1.2.7 installs fine. One more thing: maybe change :case_match to :ignore_case?. I think case match is a bit ambiguous and ignore case is more commonly used with regex etc.

tonytonyjan commented 9 years ago

You are right, :ignore_case is better naming. I've fixed in 1.2.8.

Arcovion commented 9 years ago

Thanks, good luck with this gem. :thumbsup:

Arcovion commented 9 years ago

BTW I found that these:

s1, s2 = 'henka', 'henkan'
p Amatch::JaroWinkler.new(s1).tap{ |m| m.scaling_factor = 0.1 }.match s2
p JaroWinkler.distance s1, s2, ignore_case: true, threshold: 0.5, weight: 0.1

are the same, AMatch (seems to) use 0.5 threshold and doesn't compare case by default.

tonytonyjan commented 9 years ago

@Arcovion I have no idea how amatch was implemented on account of its lack of readability of source code. In my opinion there is no threshold in amatch: ref.