Closed qinghua2016 closed 7 years ago
ORTE was unable to reliably start one or more daemons. This usually is caused by:
not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default
lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities.
the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use.
compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type.
Did I run the wrong command or if I have the other errors? @feiga
please set the mpi lib path in your ssh enviroments
Hi, have you solved? I have the same problem with you.
I have added the PATH
and LD_LIBRARY_PATH
. And two servers also can ssh to each other.
As is said in your paper, the lda model can be trained on multi machines, but I don't find the instructions to do it. As what I know, the training command is " bin/lightlda -num_vocabs 70626 -num_topics 10 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 256 -max_num_document 997733 -input_dir example/data -out_of_core -data_capacity 800", how to change it to train on multiple machines?@feiga