Closed gaffkins closed 7 years ago
For Polish, yes. This is the main purpose of the morfologik-polish subproject.
With what method? Because lockup return only base world. I need all varities by base world. Example I write pies and what I expect is psy, psu, psem, psie...
Short answer is: the same method, but different dictionary. https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-stemming/src/test/java/morfologik/stemming/DictionaryLookupTest.java#L164-L176
Morfologik doesn't ship with a dictionary for synthesis -- you'll have to invert the tagging dictionary or get the polish_synth dictionary from LanguageTool. See polish.README.Polish.txt
Hey, this question is also relevant for me. At the moment I'm using polish_syth dict from LanguageTool and IStemmer.lookup method like this:
iStemmer.lookup("<word>|<tag>")
eg.
iStemmer.lookup("niemiecki|adjp")
will result in "niemiecku", if "adja" passed as a tag it will return "niemiecko" etc. Is there a way in which I can retrieve list of all possible varieties with single request to lookup method?
You can look up a node corresponding to "niemiecki|" in the automaton and collect all the leaves starting from there. There are utilities to do this in a pretty simple way -- look at unit tests and grep the code, please.
Can I get all world varieties by word base?