Closed avitale closed 8 years ago
I wouldn't worry about it. The word vectors MITIE uses dynamically generate word morphology features when you train wordrep, so the stemming isn't very important. The main thing is to have a tokenizer that makes sense for your language. You can also perform any kind of reasonable token normalization at that processing stage as well.
ok, thanks!
Hi Davis,
Looking at the code, it seems to me that everything is language agnostic apart from the english stemmer used in text categorization.
What would be the best way to replace the stemmer with another one, or even better have multiple stemmers for different languages?
Thank you very much!