tonytonyjan / jaro_winkler

Ruby & C implementation of Jaro-Winkler distance algorithm which supports UTF-8 string.
MIT License
193 stars 29 forks source link

Code assumes UTF-8 encoding #7

Closed tepperly closed 6 years ago

tepperly commented 9 years ago

The C code assumes that the arguments are in UTF-8 encoding.

$ irb
irb(main):001:0> a = "\xe8".force_encoding("iso8859-1")
=> "\xE8"
irb(main):002:0> b = a.encode("utf-8")
=> "è"
irb(main):003:0> JaroWinkler.distance(a, b)
NameError: uninitialized constant JaroWinkler
        from (irb):3
        from /usr/local/bin/irb:11:in `<main>'
irb(main):004:0> require 'jaro_winkler'
=> true
irb(main):005:0> JaroWinkler.distance(a, b)
=> 0.0
irb(main):006:0> a.encoding
=> #<Encoding:ISO-8859-1>
irb(main):007:0> b.encoding
=> #<Encoding:UTF-8>
irb(main):008:0>