Closed redguy666 closed 11 years ago
I'm sure it'd be possible to prepare such dictionaries. Marcin Miłkowski may have more insight into this. I'll let him know.
Hi. Marcin says those dictionaries are in fact available as part of the LanguageTool project -- perhaps you can just take a look in there and reuse them?
I thought LanguageTool is rather a grammar checker, not stemming library... also - there is no Czech support, only Slovak (http://www.languagetool.org/languages/). Could you provide more information on how to use LanguageTool for stemming?
Check out the source code -- there are FSA dictionaries for multiple languages (including Czech in one of the older versions I think). Marcin Miłkowski will know more details.
Hi,
You will need to look at this to create a dictionary using morfologik
http://wiki.languagetool.org/developing-a-tagger-dictionary
To see how they use the morfologik stemming you will need to look at the LT code itself. They use Morfologik DictionaryLookup and IStemmer classes.
You can also ask in the LT lists.
Cheers,
Rodrigo
On Fri, May 17, 2013 at 10:56 AM, Maciej Lizewski notifications@github.comwrote:
I thought LanguageTool is rather a grammar checker, not stemming library... also - there is no Czech support, only Slovak ( http://www.languagetool.org/languages/). Could you provide more information on how to use LanguageTool for stemming?
— Reply to this email directly or view it on GitHubhttps://github.com/morfologik/morfologik-stemming/issues/2#issuecomment-18053016 .
thanks for your hints. will try that out.
@redguy666: there is a Czech dictionary although the support for Czech is not advertised. The reason is that we only have a dictionary, and a big one in that. I'm not sure where the file is in our Maven repo right now but here's the old location:
it worked :) at least for slovak language (there is no czech dictionary in languagetool). I created universal MrofologikStemmer filter for Solr - it accepts dictionary name as parameter instead of DICTIONARY enumeration element so you can use it for any dictionary from LanguageTool.
@milekpl - sorry, I missed your last comment. Thanks for the link to Czech dictionary!
Hi,
I have noticed that Czech and Slovak languages have quite poor stemming support in Solr. Only some basic heuristics and hunspell which is very slow in Solr 4.x. Would it be possible to prepare dictionaries similar to Polish one for that languages based for example on openoffice dictionaries? if so - how to achieve that?