Open thisray opened 4 years ago
I have the same problem, for 12077 files ~ 5 Gb it takes 4hrs. It doesn't seem to be utilizing all the cores.
Unless this can be replicated in the java-only version there's not much to do here -- I'd check with gensim.
@thisray This thread has been dormant for a while, but have you checked how many cores/threads you have in your computer? It could be that your number of cores/threads are less than 16, so 16 slows you down.
Hi,
I use the
gensim
wrapper,LdaMallet()
[link], to runMALLET
.Gensim library provide a parameter
workers
to assign the--num-threads
argument inMALLET
.(Ref: Gensim Code - line274)
But I found the
workers
seems not working, here is the different setting and running time:No matter I run this on my computer:
or on the Colab:
the results are similar, more workers spent more time. (and I have also tried
mallet-2.0.8
&mallet-2.0.7
)Dose it means I am not using a proper way to run MALLET LDA in parallel?
Thanks!
reference code: