Open adrianhust opened 6 years ago
I think you should just install on all the nodes and run with a machine list file.
Thank you for your reply, but that not works for me, lightlda depends on mpich or zeromp; I installed on two nodes, but they cannot communicate with machine list file.
SSH can login without password ?
Is there some examples of distributed training about how to config, I searched and could not find the result.
@xiaomiao91 I don't find any examples of distributed training about how to config I just train the nytimes data set in the example provide by the project.
@1234clam , @adrianhust , does mpirun -n 2
works on two nodes? if so, please ensure the mpirun is added to the ssh env
https://www.open-mpi.org/faq/?category=running may help to you
Hi,
I execute this commend on server 10.210.228.70.
mpiexec -machinefile machine_list -num_vocabs 111400 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 300000 -input_dir /data3/ad_dm/zemin/LightLDA/example/data/nytimes -data_capacity 800
It ran the nytime example on two node successfully, my machinefile like this
10.210.228.70
10.210.228.64
Now I want run my formal data on more node, should I split my big data to every node of the cluster and only execute above commend? or what else should I do or take case ? Thanks
@xiaomiao91 这个肯定只要分开拷贝到其他机器上就行了的呀~ 虽然我觉得这个设计有点麻烦,是转成libsvm之后对libsvm格式的文档进行拆分就可以了。还是中文比较好用。
谢谢啊,我试试😊
$ mpiexec
then output:
The program 'mpiexec' can be found in the following packages:
- lam-runtime
- mpich
- openmpi-bin Try: sudo apt install selected package
It is weird because I have run make install
and so on just as the build.sh
shows.
Anyone could help?
I have tried several times to install it in multiple nodes, but failed, suceeded on single machine; anyone can give a detailed guide for this? Neither mpich or zeromq works for me.
any hints, thank you!