Open thisray opened 4 years ago
Hi,
I use the gensim wrapper, LdaMallet() [link], to run MALLET.
gensim
LdaMallet()
MALLET
Gensim library provide a parameter workers to assign the --num-threads argument in MALLET. (Ref: Gensim Code - line274)
workers
--num-threads
But I found the workers seems not working, here is the different setting and running time:
`workers=1` -> run time: 7.32 sec # <-- `workers=2` -> run time: 2min 25s `workers=4` -> run time: 2min 38s `workers=16` -> run time: 3min 13s # <--
No matter I run this on my computer:
openjdk version "1.8.0_162" OpenJDK Runtime Environment (build 1.8.0_162-8u162-b12-0ubuntu0.16.04.2-b12) OpenJDK 64-Bit Server VM (build 25.162-b12, mixed mode)
or on the Colab:
openjdk version "11.0.4" 2019-07-16 OpenJDK Runtime Environment (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3) OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3, mixed mode, sharing)
the results are similar, more workers spent more time. (and I have also tried mallet-2.0.8 & mallet-2.0.7)
mallet-2.0.8
mallet-2.0.7
Dose it means I am not using a proper way to run MALLET LDA in parallel?
Thanks!
reference code:
# code in gensim (python) # (i tried with different `workers`) workers = 16 gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=num_topics, id2word=id2word, optimize_interval=1, iterations=6000, workers=workers)
# the equivalent commands in mallet (key in shell, ignore the I/O setting): $ bin/mallet train-topics --num-threads 16
Can you please provide console output? In particular, this log statement looks relevant.
Hi,
I use the
gensim
wrapper,LdaMallet()
[link], to runMALLET
.Gensim library provide a parameter
workers
to assign the--num-threads
argument inMALLET
.(Ref: Gensim Code - line274)
But I found the
workers
seems not working, here is the different setting and running time:No matter I run this on my computer:
or on the Colab:
the results are similar, more workers spent more time. (and I have also tried
mallet-2.0.8
&mallet-2.0.7
)Dose it means I am not using a proper way to run MALLET LDA in parallel?
Thanks!
reference code: