mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
973 stars 346 forks source link

Running LDA model in python and got error message "returned non-zero exit status 127" #189

Closed arun05010 closed 3 years ago

arun05010 commented 3 years ago

I am getting error message when running Topic modelling using LDA and Gensim. Then imported mallet for finding optimum number of clusters, but getting error can anyone help??

mallet_path = '/home/jupyter/TopicModelling/machine-learning/TopicModeling/Medium/mallet-2.0.8/bin/mallet.bet'

ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=10, id2word=id2word)

/opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: should_run_async will not call transform_cell automatically in the future. Please pass the result to transformed_cell argument and any exception that happen during thetransform in preprocessing_exc_tuple in IPython 7.17 and above. and should_run_async(code)

CalledProcessError Traceback (most recent call last)

in ----> 1 ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=10, id2word=id2word) /opt/conda/lib/python3.7/site-packages/gensim/models/wrappers/ldamallet.py in __init__(self, mallet_path, corpus, num_topics, alpha, id2word, workers, prefix, optimize_interval, iterations, topic_threshold, random_seed) 129 self.random_seed = random_seed 130 if corpus is not None: --> 131 self.train(corpus) 132 133 def finferencer(self): /opt/conda/lib/python3.7/site-packages/gensim/models/wrappers/ldamallet.py in train(self, corpus) 270 271 """ --> 272 self.convert_input(corpus, infer=False) 273 cmd = self.mallet_path + ' train-topics --input %s --num-topics %s --alpha %s --optimize-interval %s '\ 274 '--num-threads %s --output-state %s --output-doc-topics %s --output-topic-keys %s '\ /opt/conda/lib/python3.7/site-packages/gensim/models/wrappers/ldamallet.py in convert_input(self, corpus, infer, serialize_corpus) 259 cmd = cmd % (self.fcorpustxt(), self.fcorpusmallet()) 260 logger.info("converting temporary corpus to MALLET format with %s", cmd) --> 261 check_output(args=cmd, shell=True) 262 263 def train(self, corpus): /opt/conda/lib/python3.7/site-packages/gensim/utils.py in check_output(stdout, *popenargs, **kwargs) 1930 error = subprocess.CalledProcessError(retcode, cmd) 1931 error.output = output -> 1932 raise error 1933 return output 1934 except KeyboardInterrupt: CalledProcessError: Command '/home/jupyter/TopicModelling/machine-learning/TopicModeling/Medium/mallet-2.0.8/bin/mallet.bet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /tmp/b31bc7_corpus.txt --output /tmp/b31bc7_corpus.mallet' returned non-zero exit status 127.
mimno commented 3 years ago

Unfortunately this looks like a problem with gensim, not with Mallet. You might try Little Mallet Wrapper as an alternative, or create an issue with gensim.