Closed Rzulf closed 11 years ago
The polish-stemmer package does not fix diacritics as it is not supposed to do so. It is the speller that does. You could use the stemmer dictionary for spelling, though this is discouraged, as stemmers usually contain more words than spellers (some rare words should not be accepted if they are confused, e.g., Polish 'sie' is confused with much more frequent 'się').
Thank for reply :) PS. transforming 'sie' to 'się' is exactly what I want to do, since 'się' is much more frequent.
But to do this, you need a speller, not a stemmer. You could simply run a Speller on the words to be stemmed, and take the first suggestion. The Polish spelling dictionary to be used is available in LanguageTool repository: http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/languagetool/languagetool-language-modules/pl/src/main/resources/org/languagetool/resource/pl/hunspell/ (take .dict and .info files).
I used new (1.7.0) polish-stemmer package from maven and noticed that it doesn't fix diacritics, even though these options are true by default.
Here are simple unit tests I made http://pastebin.com/jwHSVecU
Another question is why "ą" is not replaced by "a" by default like "Ł" and "L"?