postmodern / ffi-hunspell

Ruby FFI bindings for Hunspell.
MIT License
48 stars 24 forks source link

Add extra dictionary support #17

Closed cdchapman closed 5 years ago

cdchapman commented 8 years ago

Requires at least hunspell version 1.3.4 see the commit when this interface was introduced. I don't know whether there is a way to attach the function only for specific versions of hunspell.

cdchapman commented 8 years ago

A similar situation and a workaround came up in an early ruby-ffi discussion. I'll add another commit to use this approach.

postmodern commented 7 years ago

When FFI attempts to bind a function that doesn't exist, it raises a FFI::NotFoundError exception. You could catch that when attaching the additional functions, and fail silently?

Envek commented 7 years ago

@cdchapman, thank you, this is exactly functionality that I need! @postmodern, what should be done yet in this PR to get it merged?

One more idea for ease of use is to make add_dic to search for dictionaries in Hunspell.directories as FFI::Hunspell::Dict.open method does to ease of adding other languages dictionaries.

My use case: I want to check words with russian, english and my custom dictionary (with my domain words).

For now with the code from this pull request I can do it as:

require 'ffi/hunspell'
dict = FFI::Hunspell.dict('ru_RU')
dict.add_dic('/usr/share/hunspell/en_US.dic') # ← I want it to be just en_US, I don't want to stick to any one path (which is differs between distros, etc)
dict.add_dic('/full/path/to/custom.dic') # dxg/MS is here

# Usage:
dict.check?('собака'.encode(dict.encoding)) # => true
dict.check?('dog'.encode(dict.encoding)) # => true
dict.check?('dxg'.encode(dict.encoding)) # => true

I can continue work on this PR with your permission. WDYT?

postmodern commented 7 years ago

Sorry for the delay. Only one minor issue, but this looks ready to be merged.

cdchapman commented 6 years ago

Hi @Envek. Sorry for the delay. It would not make sense to use the russian affix file, for example, with the en_US dictionary because the affixes for different languages mean different things. Extra dictionaries reuse the affix file of the main dictionary. See hunspell/hunspell#348.

A better approach would be the following:

require 'ffi/hunspell'
dict = FFI::Hunspell.dict('ru_RU')
dict = dict.add_dic('/full/path/to/extra_russian_words.dic') # medical terms are here

english_dict = FFI::Hunspell.dict('en_US')
english_dict.add_dic('/full/path/to/custom.dic') # dxg/MS is here

# Usage:
dict.check?('собака'.encode(dict.encoding)) # => true
dict.check?('полиглактином'.encode(dict.encoding)) # => true
english_dict.check?('dog'.encode(dict.encoding)) # => true
english_dict.check?('dxg'.encode(dict.encoding)) # => true

It is useful anyhow to know what language is being checked. An application could guess the language using some sort of heuristics, but it is good to be explicit about the guessing because it may choose the wrong dictionary.

Envek commented 6 years ago

Yes, I figured that and finally I did exactly this: have two main dictionaries for every language and two additional dictionaries (one for each language).

But anyway this PR is still needed.

postmodern commented 5 years ago

Been busy at work. Finally got around to working on ffi-hunspell. Merged and will be in 0.6.0.

wynksaiddestroy commented 4 years ago

Is it possible by chance to release a new version to offer this new feature?

postmodern commented 3 years ago

FYI this feature was released in 0.6.0, released last November. https://github.com/postmodern/ffi-hunspell/blob/master/ChangeLog.md#060--2020-11-28