postmodern / ffi-hunspell

Ruby FFI bindings for Hunspell.
MIT License
48 stars 24 forks source link

Does not work for input in UTF-8 if dictionary encoding is ISO-8859-1 #15

Open nkrot opened 8 years ago

nkrot commented 8 years ago

Looks like the gem expects the user to know the encoding of the dictionaries.

require 'ffi/hunspell'
dict = FFI::Hunspell.dict('de_DE')

# kinda surprising
dict.valid?("Vergnügungseinrichtungen")
#=> false

# works as expected
dict.valid?("Vergnügungseinrichtungen".encode("iso-8859-1"))
#=> true

dict.encoding
#=> #<Encoding:ISO-8859-1>

To make it work, I have to paranoically recode each word like this:

dict.valid?("Vergnügungseinrichtungen".encode(dict.encoding))
#=> true

Contrast with the console utility (tested in locales en_US.utf8 and de_DE.utf8) that sorts out the encoding itself:

> echo Vergnügungseinrichtungen | hunspell -d de_DE
Hunspell 1.3.3
-

Maybe it makes sense fixing ze issue in the gem or at least commenting on it in the documentation.

Tested on debian jessie, with ruby 2.2.4p230.