MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
Running LDA model in python and got error message "returned non-zero exit status 127" #189

I am getting error message when running Topic modelling using LDA and Gensim. Then imported mallet for finding optimum number of clusters, but getting error can anyone help??

mallet_path = '/home/jupyter/TopicModelling/machine-learning/TopicModeling/Medium/mallet-2.0.8/bin/mallet.bet'

ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=10, id2word=id2word)

/opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: should_run_async will not call transform_cell automatically in the future. Please pass the result to transformed_cell argument and any exception that happen during thetransform in preprocessing_exc_tuple in IPython 7.17 and above. and should_run_async(code)

CalledProcessError Traceback (most recent call last)

in ----> 1 ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=10, id2word=id2word) /opt/conda/lib/python3.7/site-packages/gensim/models/wrappers/ldamallet.py in __init__(self, mallet_path, corpus, num_topics, alpha, id2word, workers, prefix, optimize_interval, iterations, topic_threshold, random_seed) 129 self.random_seed = random_seed 130 if corpus is not None: --> 131 self.train(corpus) 132 133 def finferencer(self): /opt/conda/lib/python3.7/site-packages/gensim/models/wrappers/ldamallet.py in train(self, corpus) 270 271 """ --> 272 self.convert_input(corpus, infer=False) 273 cmd = self.mallet_path + ' train-topics --input %s --num-topics %s --alpha %s --optimize-interval %s '\ 274 '--num-threads %s --output-state %s --output-doc-topics %s --output-topic-keys %s '\ /opt/conda/lib/python3.7/site-packages/gensim/models/wrappers/ldamallet.py in convert_input(self, corpus, infer, serialize_corpus) 259 cmd = cmd % (self.fcorpustxt(), self.fcorpusmallet()) 260 logger.info("converting temporary corpus to MALLET format with %s", cmd) --> 261 check_output(args=cmd, shell=True) 262 263 def train(self, corpus): /opt/conda/lib/python3.7/site-packages/gensim/utils.py in check_output(stdout, *popenargs, **kwargs) 1930 error = subprocess.CalledProcessError(retcode, cmd) 1931 error.output = output -> 1932 raise error 1933 return output 1934 except KeyboardInterrupt: CalledProcessError: Command '/home/jupyter/TopicModelling/machine-learning/TopicModeling/Medium/mallet-2.0.8/bin/mallet.bet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /tmp/b31bc7_corpus.txt --output /tmp/b31bc7_corpus.mallet' returned non-zero exit status 127.
mimno commented 3 years ago

Unfortunately this looks like a problem with gensim, not with Mallet. You might try Little Mallet Wrapper as an alternative, or create an issue with gensim.