languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.03k stars 1.38k forks source link

Remove hunspell dependency #199

Open danielnaber opened 9 years ago

danielnaber commented 9 years ago

To provide a 100% pure Java software, we'd like to switch from Hunspell native code to Morfologik. See http://wiki.languagetool.org/hunspell-support for how to build the Morfologik dictionaries.

Find all affected languages by looking for usages of Hunspell and its sub classes (like HunspellNoSuggestionRule):

The following might not be trivial to port to Morfologik:

danielnaber commented 9 years ago

Also see the discussion at http://www.mail-archive.com/languagetool-devel@lists.sourceforge.net/msg04435.html

danielnaber commented 9 years ago

For the recursive tags, Laszlo suggests to use the output of the first unmunch as a dictionary for a second unmunch. I couldn't make that work yet in a sensible way (tried with Galician).

ghost commented 9 years ago

Compound support would be needed for a lot of languages. The + side is it accepts words just invented; the - side is it also accpets some wrong words that are technically correct. For all those languages using a huge words list and hunspell dictionary as a filter will render a workable words list. I guess 99.99% coverage would be good enough. With a large corpus as input, it can be done.