Closed moshicaixi closed 3 years ago
Could you share the output of mpirun --version
? Thanks!
I forgot to update my issue. At first, I did't know there was no mpi environment on my server, which caused this error. But when I installed mpich, I got another error. When I changed to openmpi, It became normal. Maybe there is little incompatibility between mpich and openmpi? I don't know. However, thanks for your reply!
mpich error: `[mpiexec@guest-server] match_arg (../../../../mpich-3.4.2/src/pm/hydra/utils/args/args.c:160): unrecognized argument allow-run-as-root
[mpiexec@guest-server] HYDU_parse_array (../../../../mpich-3.4.2/src/pm/hydra/utils/args/args.c:175): argument matching returned error
[mpiexec@guest-server] parse_args (../../../../mpich-3.4.2/src/pm/hydra/ui/mpich/utils.c:1603): error parsing input array
[mpiexec@guest-server] HYD_uii_mpx_get_parameters (../../../../mpich-3.4.2/src/pm/hydra/ui/mpich/utils.c:1655): unable to parse user arguments
[mpiexec@guest-server] main (../../../../mpich-3.4.2/src/pm/hydra/ui/mpich/mpiexec.c:128): error parsing parameters`
Sounds good. Thanks for your update.
when I run the commad 'torchpack dist-run -np 3 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml' , I got a error as follows. Could you please tell me how can I resolve this problem? Thanks very much!
ssh: Could not resolve hostname localhost:3: Name or service not known
ORTE was unable to reliably start one or more daemons. This usually is caused by:
not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default
lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities.
the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use.
compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type.
an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements).