tonytonyjan / jaro_winkler

Ruby & C implementation of Jaro-Winkler distance algorithm which supports UTF-8 string.
MIT License
192 stars 29 forks source link

C extension: memory leak on InvalidWeightError #20

Closed Freaky closed 5 years ago

Freaky commented 5 years ago

In ext/jaro_winkler/jaro_winkler.c distance(), these two calls allocate (hopefully, it never actually checks):

 codepoints_init(&cp1, s1);
 codepoints_init(&cp2, s2);

They're freed at the end of the function:

codepoints_free(&cp1);
codepoints_free(&cp2);

However, in between these two points an exception can be raised which will bypass these calls:

if (c_opt.weight > 0.25)
  rb_raise(rb_eInvalidWeightError, "Scaling factor should not exceed 0.25, "

Looks like they could be moved to just before the call to distance_fn.

tonytonyjan commented 5 years ago

@Freaky Thank you for pointing out this and making this gem better.

I confirmed the latest commit can fix the bug:

require 'shellwords'

def cmd(weight:)
  Shellwords.escape <<~CMD
    1000000.times do
      JaroWinkler.distance('tonytonyjan', 'tonytonyjam', weight: #{weight}) rescue nil
    end
    system "ps -o rss -p \#{Process.pid}"
  CMD
end

puts `ruby -I lib -r jaro_winkler/jaro_winkler_ext -e #{cmd(weight: 0.25)}`
puts `ruby -I lib -r jaro_winkler/jaro_winkler_ext -e #{cmd(weight: 0.26)}`

before

   RSS
 16152
   RSS
117532

after

  RSS
16696
  RSS
13336

cheers 🍻