morfologik / morfologik-stemming

Tools for finite state automata construction and dictionary-based morphological dictionaries. Includes Polish stemming dictionary.
BSD 3-Clause "New" or "Revised" License
186 stars 44 forks source link

finding words similar to words already in dictonary #106

Closed jaumeortola closed 3 years ago

jaumeortola commented 3 years ago

I would like to use the Morfologik speller to find words similar to words that are already in the dictionary. It would be necessary just to remove the condition && !isInDictionary(word) here: https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-speller/src/main/java/morfologik/speller/Speller.java#L410

I am already doing it adding a diacritic to the word (i.e. adding a spelling error to the right word), and it works as expected most of the time, but not always (for example, if there is a transposition).

We need to add something like: public ArrayList<CandidateData> findReplacementCandidates(String word, boolean evenIfWordIsInDictonary) {

Is that okay, @dweiss? I will provide a pull-request.

dweiss commented 3 years ago

I'm not sure I understand what you'd like to add, but I think it'd be better to create a different method than modify existing one. This would keep the API compatible and perhaps the name of the method could reflect its true purpose?

jaumeortola commented 3 years ago

but I think it'd be better to create a different method than modify existing one. This would keep the API compatible and perhaps the name of the method could reflect its true purpose?

Thanks for the answer. The API will be compatible, of course. The new method could be named findSimilarWords(String word).

dweiss commented 3 years ago

Sure, why not then.