mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
990 stars 344 forks source link

Lemmatization #192

Closed ECMGit closed 3 years ago

ECMGit commented 3 years ago

Hi, I am wondering does Mallet have lemmatization when we build the pipeline? I want to do: services -> service workers -> worker

mimno commented 3 years ago

At least for English this is much harder to do well than most people expect: https://mimno.infosci.cornell.edu/papers/schofield_tacl_2016.pdf

It's often really a display issue, not a model issue. Adding stemming as a post-processing step may be more useful.