microsoft / LightLDA

Scalable, fast, and lightweight system for large-scale topic modeling
http://www.dmtk.io
MIT License
842 stars 234 forks source link

how to install it on multi nodes for distributed training? #60

Open adrianhust opened 6 years ago

adrianhust commented 6 years ago

I have tried several times to install it in multiple nodes, but failed, suceeded on single machine; anyone can give a detailed guide for this? Neither mpich or zeromq works for me.

any hints, thank you!

1234clam commented 6 years ago

I think you should just install on all the nodes and run with a machine list file.

adrianhust commented 6 years ago

Thank you for your reply, but that not works for me, lightlda depends on mpich or zeromp; I installed on two nodes, but they cannot communicate with machine list file.

1234clam commented 6 years ago

SSH can login without password ?

xiaomiao91 commented 6 years ago

Is there some examples of distributed training about how to config, I searched and could not find the result.

1234clam commented 6 years ago

@xiaomiao91 I don't find any examples of distributed training about how to config I just train the nytimes data set in the example provide by the project.

chivee commented 6 years ago

@1234clam , @adrianhust , does mpirun -n 2 works on two nodes? if so, please ensure the mpirun is added to the ssh env

https://www.open-mpi.org/faq/?category=running may help to you

xiaomiao91 commented 6 years ago

Hi, I execute this commend on server 10.210.228.70. mpiexec -machinefile machine_list -num_vocabs 111400 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 300000 -input_dir /data3/ad_dm/zemin/LightLDA/example/data/nytimes -data_capacity 800 It ran the nytime example on two node successfully, my machinefile like this

10.210.228.70
10.210.228.64

Now I want run my formal data on more node, should I split my big data to every node of the cluster and only execute above commend? or what else should I do or take case ? Thanks

1234clam commented 6 years ago

@xiaomiao91 这个肯定只要分开拷贝到其他机器上就行了的呀~ 虽然我觉得这个设计有点麻烦,是转成libsvm之后对libsvm格式的文档进行拆分就可以了。还是中文比较好用。

xiaomiao91 commented 6 years ago

谢谢啊,我试试😊

Abigale001 commented 6 years ago

$ mpiexec then output:

The program 'mpiexec' can be found in the following packages:

  • lam-runtime
  • mpich
  • openmpi-bin Try: sudo apt install selected package

It is weird because I have run make install and so on just as the build.sh shows.

Anyone could help?