pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.53k stars 22.21k forks source link

Build error : mpi/mpi_gpu_test.cc.o: undefined reference to symbol '_ZN3MPI8Datatype4FreeEv' #8028

Open KamranAlipour opened 6 years ago

KamranAlipour commented 6 years ago

I get the following error while trying to build the package on ubuntu 16.04:

cc1plus: warning: unrecognized command line option '-Wno-unknown-warning-option' cc1plus: warning: unrecognized command line option '-Wno-invalid-partial-specialization' [ 91%] Linking CXX executable ../bin/mpi_gpu_test /usr/bin/ld: CMakeFiles/mpi_gpu_test.dir/mpi/mpi_gpu_test.cc.o: undefined reference to symbol '_ZN3MPI8Datatype4FreeEv' //usr/lib/libmpi_cxx.so.1: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status caffe2/CMakeFiles/mpi_gpu_test.dir/build.make:110: recipe for target 'bin/mpi_gpu_test' failed make[2]: [bin/mpi_gpu_test] Error 1 CMakeFiles/Makefile2:3504: recipe for target 'caffe2/CMakeFiles/mpi_gpu_test.dir/all' failed make[1]: [caffe2/CMakeFiles/mpi_gpu_test.dir/all] Error 2 Makefile:138: recipe for target 'all' failed make: *** [all] Error 2

alexge233 commented 6 years ago

same here using master branch, disabling tests with cmake .. -DBUILD_TEST=OFF is a workaround.

pjh5 commented 6 years ago

Can you post your cmake output? Do you have openmpi installed? What is "mpirun --version" ?

wschin commented 6 years ago

Is it possible to set up an environment to disable this MPI feature?

pjh5 commented 6 years ago

Passing -DUSE_MPI=OFF to cmake should prevent Caffe2 from trying to build with OpenMPI

wschin commented 6 years ago

Thanks. It works but I got another error: CRITICAL:root:Cannot load caffe2.python. Error: /home/travis/miniconda/conda-bld/caffe2-gcc4.8_1528332799972/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.6/site-packages/caffe2/python/caffe2_pybind11_state.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _Py_ZeroStruct Do you have any idea? I just want to install Caffe2 on Travis CI to test my ONNX converters but currently CNTK is the only backend I can use.

pjh5 commented 6 years ago

@wschin I haven't seen that before. Can you open another issue with more details about your OS, python environment, and cmake output?

pjh5 commented 6 years ago

This looks like it might be relevant https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686926

chandanjc commented 5 years ago

Thanks pjh5, passing the below flag to g++, as mentioned in the bugs.debian link, fixed my issue with OpenMpi3.1: -DOMPI_SKIP_MPICXX=1

william-dawson commented 4 years ago

I wonder if maybe this issue was fixed with #11416 . I found a similar bug in my own software, and the solution of adding MPI_CXX_LIBRARIES to target_link_libraries worked for me. #11416 is newer than the comments here.