Open emeeks opened 10 years ago
The TopicalNGrams code is still there, but it's not immediately accessible from the "train-topics" interface anymore. But you shouldn't use it anyway.
The old class that parsed "train-topics" commands switched between several models based on which command-line parameters were activated. As a result, not all combinations of parameters were meaningful, and many functions weren't supported by different models. It was very confusing. Now train-topics only gives you either plain LDA or polylingual LDA, and everything should be supported.
TNG didn't make the cut. I don't recommend it anyway. What I do, strongly, recommend, is identifying multi-word terms in pre-processing. There's a pipe for doing this, but it's not an option in "import-file" yet.
Pardon my ignorance, but how do you suggest identifying multi-word terms in pre-processing? When you say that there's a pipe for doing this, to which class are you referring?
@briandastous see here: http://www.mimno.org/articles/phrases/
When I try the --use-ngrams option, I receive Unrecognized option 6: --use-ngrams