xiaoyeli / superlu_dist

Distributed memory, MPI based SuperLU
https://portal.nersc.gov/project/sparse/superlu/
Other
185 stars 65 forks source link

superlu_dist 9.0 fails to build with CombBLAS enabled #165

Open francesco-ballarin opened 3 months ago

francesco-ballarin commented 3 months ago

Hi, we are trying to update superlu_dist to version 9.0 on Debian, with CombBLAS enabled.

The following cmake configuration is used when building the package locally:

cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=None -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DFETCHCONTENT_FULLY_DISCONNECTED=ON -DCMAKE_INSTALL_RUNSTATEDIR=/run -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON "-GUnix Makefiles" -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_INSTALL_LIBDIR=lib/x86_64-linux-gnu -DBUILD_SHARED_LIBS=ON -DCMAKE_SKIP_RPATH=ON -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpic\+\+ -DXSDK_ENABLE_Fortran=ON -DCMAKE_FORTRAN_COMPILER=mpifort -DMPIEXEC_PREFLAGS=--allow-run-as-root -DCMAKE_INSTALL_INCLUDEDIR=include/superlu-dist -Denable_complex16=ON -DTPL_ENABLE_INTERNAL_BLASLIB=OFF -DTPL_BLAS_LIBRARIES=/usr/lib/x86_64-linux-gnu/libblas.so -DTPL_ENABLE_LAPACKLIB=ON -DTPL_LAPACK_LIBRARIES=/usr/lib/x86_64-linux-gnu/liblapack.so -DTPL_ENABLE_PARMETISLIB=ON "-DTPL_PARMETIS_LIBRARIES=-lparmetis -lmetis" -DTPL_PARMETIS_INCLUDE_DIRS=/usr/include/parmetis -DTPL_ENABLE_COMBBLASLIB=ON "-DTPL_COMBBLAS_LIBRARIES=-lCombBLAS -lGraphGenlib -lUsortlib" -DTPL_COMBBLAS_INCLUDE_DIRS=/usr/include/CombBLAS/ ..

We encounter two issues:

  1. build fails with
    cd /repositories/superlu-dist/obj-x86_64-linux-gnu/EXAMPLE && /usr/bin/cmake -E cmake_link_script CMakeFiles/pddrive3d.dir/link.txt --verbose=1
    /usr/bin/mpicc -I/usr/include/parmetis -DUSE_VENDOR_BLAS -fopenmp  -g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/repositories/superlu-dist=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro  -L/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran  -L/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran CMakeFiles/pddrive3d.dir/pddrive3d.c.o CMakeFiles/pddrive3d.dir/dcreate_matrix.c.o CMakeFiles/pddrive3d.dir/dcreate_matrix3d.c.o -o pddrive3d  ../SRC/libsuperlu_dist.so.9.0.0 /usr/lib/x86_64-linux-gnu/libblas.so -lm /usr/lib/x86_64-linux-gnu/liblapack.so -lparmetis -lmetis -lCombBLAS -lGraphGenlib -lUsortlib -lm /usr/lib/gcc/x86_64-linux-gnu/13/libgomp.so /usr/lib/x86_64-linux-gnu/libpthread.a /usr/lib/x86_64-linux-gnu/libmpi_usempif08.so /usr/lib/x86_64-linux-gnu/libmpi_usempi_ignore_tkr.so /usr/lib/x86_64-linux-gnu/libmpi_mpifh.so /usr/lib/x86_64-linux-gnu/libmpi.so /usr/lib/x86_64-linux-gnu/libopen-rte.so /usr/lib/x86_64-linux-gnu/libopen-pal.so /usr/lib/x86_64-linux-gnu/libhwloc.so /usr/lib/x86_64-linux-gnu/libevent_core.so /usr/lib/x86_64-linux-gnu/libevent_pthreads.so -lm /usr/lib/x86_64-linux-gnu/libz.so
    /usr/bin/ld: ../SRC/libsuperlu_dist.so.9.0.0: undefined reference to `s_c2cpp_GetHWPM'

Can I ask you to double check if the logic at https://github.com/xiaoyeli/superlu_dist/blob/master/SRC/CMakeLists.txt#L274 is correct? We are in the case in which both enable_double and enable_single are enabled, and the linker error seems to suggest that we would have needed to append both double/d_c2cpp_GetHWPM.cpp and single/s_c2cpp_GetHWPM.cpp to the list.

  1. Even manually patching manually adding single/s_c2cpp_GetHWPM.cpp and double/dHWPM_CombBLAS.hpp to the list of sources in SRC/CMakeLists.txt, I get a compilation error in single/s_c2cpp_GetHWPM.cpp, saying that sHWPM_CombBLAS.hpp is missing. Indeed, I can't find that file in the single folder, but I can find a similar dHWPM_CombBLAS.hpp in the double folder.

Can you check whether sHWPM_CombBLAS.hpp should be added to the repository?

cc @drew-parsons , since this issue is probably a follow up of https://github.com/xiaoyeli/superlu_dist/issues/110

Thanks, Francesco

drew-parsons commented 1 month ago

The file is still in source, moved to SRC/single by https://github.com/xiaoyeli/superlu_dist/commit/03c7adeb6c7035dd0b26c07cdd2eff7684cc6d63

The debian build compiles d_c2cpp_GetHWPM.cpp.o and z_c2cpp_GetHWPM.cpp.o, but no attempt is made to compile s_c2cpp_GetHWPM.cpp.o. Would this be a bug in the debian build configuration rather than a bug in superlu-dist?

drew-parsons commented 1 month ago

I guess the culprit is https://github.com/xiaoyeli/superlu_dist/blob/ea4d47b206387a1592eea46493519d57cf3984d6/SRC/CMakeLists.txt#L271 which only adds s_c2cpp_GetHWPM.cpp if enable_double is not set (l.272). And enable_double is set by default at https://github.com/xiaoyeli/superlu_dist/blob/ea4d47b206387a1592eea46493519d57cf3984d6/CMakeLists.txt#L23

Should the enable_single block not have this enable_double override at l.272 for HAVE_COMBBLAS handling, and only be adding s_c2cpp_GetHWPM.cpp ? As done for enable_double at https://github.com/xiaoyeli/superlu_dist/blob/ea4d47b206387a1592eea46493519d57cf3984d6/SRC/CMakeLists.txt#L194 and for enable_complex16 block at https://github.com/xiaoyeli/superlu_dist/blob/ea4d47b206387a1592eea46493519d57cf3984d6/SRC/CMakeLists.txt#L348

Or is there some other issue complicating single and double precision CombBLAS support?