postmodern / ffi-hunspell

Ruby FFI bindings for Hunspell.
MIT License
48 stars 24 forks source link

Can't get it work with Russian #7

Open houshuang opened 11 years ago

houshuang commented 11 years ago

Can't get this to work, not sure if it's a UTF8 issue or what.

require 'ffi/hunspell' c= FFI::Hunspell.dict('ru_RU') p c.stem("рассчитывал") #-> []

command line using hunspell binary: textmining|master⚡ ⇒ echo рассчитывал | hunspell -d ru_RU -s рассчитывал рассчитывать

nkrot commented 8 years ago

Works for me (my locale is UTF-8)

require 'ffi/hunspell'
dict = FFI::Hunspell.dict('ru_RU')

dict.valid? "рассчитывал"
#=> true 

dict.encoding
#=> #<Encoding:UTF-8> 

dict.stem "рассчитывал"
#=> ["рассчитывать"]
postmodern commented 7 years ago

@houshuang what does __ENCODING__ return in irb? What is the output of the locale command?

Envek commented 7 years ago

Yeah, this is encoding problems:

On Ubuntu 17.04 (hunspell 1.4.1-2build1):

dict = FFI::Hunspell.dict('ru_RU')
dict.encoding
# => #<Encoding:KOI8-R (autoload)>

dict.suggest('ощибка')
# => []

dict.suggest('ощибка'.encode(dict.encoding)).map { |s| s.encode(__ENCODING__) }
# => ["ощипка", "ошибка"]