morfologik / morfologik-stemming

Tools for finite state automata construction and dictionary-based morphological dictionaries. Includes Polish stemming dictionary.
BSD 3-Clause "New" or "Revised" License
187 stars 44 forks source link

Update Speller.java #45

Closed Mility closed 9 years ago

Mility commented 9 years ago

check the separate words(eg.misspelled the word "together" to "to gether")

milekpl commented 9 years ago

You must mean something altogether different, namely replacing "to gether" with "together", right? Why don't you add a property for the configuration file to make this configurable by the dictionary author?

Mility commented 9 years ago

yes, I want to add this function in speller.

Mility commented 9 years ago

and I dont know what do you mean.

milekpl commented 9 years ago

Well, our dictionaries come with property files. Using those, you can choose whether you want to, for example, ignore diacritic characters or split words written together (runon words). See here for the list of constants in the property files:

http://wiki.languagetool.org/hunspell-support

We should have another property, for example, "fsa.dict.speller.joinwords". You can see in the code that if you select runon words, the speller automatically tries to split words. And now it could also try to join them.

Hope this helps.

Mility commented 9 years ago

Thanks.

Mility commented 9 years ago

https://github.com/morfologik/morfologik-stemming/pull/47 https://github.com/morfologik/morfologik-stemming/pull/48 https://github.com/morfologik/morfologik-stemming/pull/49 I don't know how to merge those pull requests, and haven't test it.