mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
989 stars 344 forks source link

--use-ngrams gone? #4

Open emeeks opened 10 years ago

emeeks commented 10 years ago

When I try the --use-ngrams option, I receive Unrecognized option 6: --use-ngrams

mimno commented 10 years ago

The TopicalNGrams code is still there, but it's not immediately accessible from the "train-topics" interface anymore. But you shouldn't use it anyway.

The old class that parsed "train-topics" commands switched between several models based on which command-line parameters were activated. As a result, not all combinations of parameters were meaningful, and many functions weren't supported by different models. It was very confusing. Now train-topics only gives you either plain LDA or polylingual LDA, and everything should be supported.

TNG didn't make the cut. I don't recommend it anyway. What I do, strongly, recommend, is identifying multi-word terms in pre-processing. There's a pipe for doing this, but it's not an option in "import-file" yet.

briandastous commented 10 years ago

Pardon my ignorance, but how do you suggest identifying multi-word terms in pre-processing? When you say that there's a pipe for doing this, to which class are you referring?

behfar commented 8 years ago

@briandastous see here: http://www.mimno.org/articles/phrases/