mpi-advance / locality_aware

Temporary Repo for neighbor_alltoallv
BSD 3-Clause "New" or "Revised" License
7 stars 8 forks source link

`MPI_LIBRARIES` is unset in GPU build when linking `gpu_alltoall` #17

Open cwpearson opened 1 year ago

cwpearson commented 1 year ago

At this point, MPI_LIBRARIES is unset, causing link errors in the executable

https://github.com/mpi-advance/locality_aware/blob/b18df47c27fc4c019e6ee28d1802ccc3fc0ca8c0/benchmarks/CMakeLists.txt#L10

reproducer on Sandia cee-a100-007

MPI_ADVANCE_SRC=$HOME/repos/locality_aware
MPI_ADVANCE_BUILD=$MPI_ADVANCE_SRC/build-cee-a100-007
MPI_ADVANCE_INSTALL=$MPI_ADVANCE_SRC/install-cee-a100-007

source /projects/sems/modulefiles/utils/sems-v2-modules-init.sh
module load sems-gcc/10.1.0
module load sems-cuda/11.4.2
module load sems-openmpi/4.0.5-cuda-11.4.2
module load sems-cmake/3.23.1

git clone git@github.com:mpi-advance/locality_aware.git $MPI_ADVANCE_SRC || true
cd $MPI_ADVANCE_SRC && git checkout gpu
cd -

cmake \
-D CMAKE_CXX_COMPILER=`which mpicxx` \
-D CMAKE_C_COMPILER=`which mpicc` \
-S $MPI_ADVANCE_SRC \
-B $MPI_ADVANCE_BUILD \
-D CMAKE_INSTALL_PREFIX=$MPI_ADVANCE_INSTALL \
-D USE_CUDA=ON \
-D CMAKE_BUILD_TYPE=RelWithDebInfo

make -C $MPI_ADVANCE_BUILD install VERBOSE=1

yields

[ 92%] Linking CUDA executable gpu_alltoall
cd /ascldap/users/cwpears/repos/locality_aware/build-cee-a100-007/benchmarks && /projects/sems/install/rhel7-x86_64/sems/v2/utility/cmake/3.23.1/gcc/8.3.0/base/frlm6uo/bin/cmake -E cmake_link_script CMakeFiles/gpu_alltoall.dir/link.txt --verbose=1
/projects/sems/install/rhel7-x86_64/sems-compilers/tpl/gcc/10.1.0/gcc/4.8.5/base/4wwfxhh/bin/g++ CMakeFiles/gpu_alltoall.dir/gpu_alltoall.cpp.o CMakeFiles/gpu_alltoall.dir/cmake_device_link.o -o gpu_alltoall  ../lib/libmpi_advance.a /projects/sems/install/rhel7-x86_64/sems/v2/tpl/cuda/11.4.2/gcc/10.1.0/base/ex2t3fn/lib64/libcudart.so /projects/sems/install/rhel7-x86_64/sems-compilers/tpl/gcc/10.1.0/gcc/4.8.5/base/4wwfxhh/lib64/libgomp.so -lcudadevrt -lcudart_static -lrt -lpthread -ldl  -L"/projects/sems/install/rhel7-x86_64/sems/v2/tpl/cuda/11.4.2/gcc/10.1.0/base/ex2t3fn/targets/x86_64-linux/lib/stubs" -L"/projects/sems/install/rhel7-x86_64/sems/v2/tpl/cuda/11.4.2/gcc/10.1.0/base/ex2t3fn/targets/x86_64-linux/lib"
CMakeFiles/gpu_alltoall.dir/gpu_alltoall.cpp.o: In function `main':
tmpxft_0002ba0e_00000000-6_gpu_alltoall.cudafe1.cpp:(.text.startup+0x64): undefined reference to `MPI_Init'
tmpxft_0002ba0e_00000000-6_gpu_alltoall.cudafe1.cpp:(.text.startup+0x72): undefined reference to `ompi_mpi_comm_world'
tmpxft_0002ba0e_00000000-6_gpu_alltoall.cudafe1.cpp:(.text.startup+0x77): undefined reference to `MPI_Comm_rank'
tmpxft_0002ba0e_00000000-6_gpu_alltoall.cudafe1.cpp:(.text.startup+0x85): undefined reference to `ompi_mpi_comm_world'
tmpxft_0002ba0e_00000000-6_gpu_alltoall.cudafe1.cpp:(.text.startup+0x8a): undefined reference to `MPI_Comm_size'
tmpxft_0002ba0e_00000000-6_gpu_alltoall.cudafe1.cpp:(.text.startup+0x184): undefined reference to `ompi_mpi_comm_world'
tmpxft_0002ba0e_00000000-6_gpu_alltoall.cudafe1.cpp:(.text.startup+0x1d4): undefined reference to `ompi_mpi_comm_world'
...
cwpearson commented 1 year ago

Other test binaries link with mpicxx instead of g++ for reasons I don't understand, so they don't have this problem. I believe this is the actual problem, and MPI_LIBRARIES may be red herring

cwpearson commented 1 year ago

I wonder if the problem is that heterogeneous_SOURCES is marked as CUDA language when USE_CUDA is on

https://github.com/mpi-advance/locality_aware/blob/b18df47c27fc4c019e6ee28d1802ccc3fc0ca8c0/src/CMakeLists.txt#L23

cwpearson commented 1 year ago

I'm stumped for now. It may be that MPI on cee-a100-007 does not set any of these MPI_LIBRARIES things because it expects to use mpicxx to actually do the linking, but that doesn't play nice with our CMakeLists.txt