A fork of Caffe with OpenMPI-based Multi-GPU (mainly data parallel) support for action recognition and more. More documentation please see the original readme.
Dear @yjxiong ,
I compiled your caffe with -DUSE_MPI=ON and everything works like a charm, except python interface.
I made a simple python script, say test.py, with a single line: import caffe
"mpirun -n 2 python test.py" raise error:
[003761c78f69:00470] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /usr/local/mpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[003761c78f69:00469] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /usr/local/mpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[003761c78f69:00470] mca_base_component_repository_open: unable to open mca_shmem_sysv: /usr/local/mpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[003761c78f69:00469] mca_base_component_repository_open: unable to open mca_shmem_sysv: /usr/local/mpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[003761c78f69:00469] mca_base_component_repository_open: unable to open mca_shmem_mmap: /usr/local/mpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[003761c78f69:00470] mca_base_component_repository_open: unable to open mca_shmem_mmap: /usr/local/mpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[003761c78f69:00469] mca_base_component_repository_open: unable to open mca_shmem_posix: /usr/local/mpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[003761c78f69:00470] mca_base_component_repository_open: unable to open mca_shmem_posix: /usr/local/mpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
Dear @yjxiong , I compiled your caffe with -DUSE_MPI=ON and everything works like a charm, except python interface.
I made a simple python script, say test.py, with a single line: import caffe "mpirun -n 2 python test.py" raise error:
Do you have any idea? Thank you:D