microsoft / LightLDA

Scalable, fast, and lightweight system for large-scale topic modeling
http://www.dmtk.io
MIT License
842 stars 235 forks source link

how can I train the lda model by multi machines? #54

Closed qinghua2016 closed 7 years ago

qinghua2016 commented 7 years ago

As is said in your paper, the lda model can be trained on multi machines, but I don't find the instructions to do it. As what I know, the training command is " bin/lightlda -num_vocabs 70626 -num_topics 10 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 256 -max_num_document 997733 -input_dir example/data -out_of_core -data_capacity 800", how to change it to train on multiple machines?@feiga

qinghua2016 commented 7 years ago

As the file in example/readme.md says, the distribution running command with MPI is "Running with MPI, you just need to run mpiexec --machinefile machine_file lightlda -lightlda_arguments...". So I change the training command to " mpiexec --machinefile machine_filebin/lightlda -num_vocabs 174481 -num_topics 10 -num_iterations 100 -alpha 0.1 beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 1 -max_num_document 850873input_dir example/chatdata -out_of_core -data_capacity 800". My machine_file is as follows: 192.168.11.105 192.168.11.118 My machine id is 105, and I want to train the model on both my computer 105 and another machine 118. I run the command, it asks to input the password of machine 118, I input the right the password of machine 118, but it occurs the error as follow: qinghua@192.168.11.105's password: Permission denied, please try again. qinghua@192.168.11.105's password: Permission denied, please try again. qinghua@192.168.11.105's password: Permission denied (publickey,password).

ORTE was unable to reliably start one or more daemons. This usually is caused by:

chivee commented 7 years ago

please set the mpi lib path in your ssh enviroments

Abigale001 commented 6 years ago

Hi, have you solved? I have the same problem with you.

I have added the PATH and LD_LIBRARY_PATH. And two servers also can ssh to each other.