xiaoyeli / superlu

Supernodal sparse direct solver. https://portal.nersc.gov/project/sparse/superlu/
Other
281 stars 96 forks source link

superlu 7 tests fail to find libmatgen.so #153

Closed drew-parsons closed 2 months ago

drew-parsons commented 2 months ago

I'm trying to run superlu's test suite to validate the build of debian packages. The tests in TESTING link to libmatgen.so, built from TESTING/METGEN.

With super 6.0.1, tests built and passed fine, e.g. https://buildd.debian.org/status/fetch.php?pkg=superlu&arch=amd64&ver=6.0.1%2Bdfsg1-1%2Bb1&stamp=1707532797&raw=0 or https://tests.reproducible-builds.org/debian/rbuild/unstable/amd64/superlu_6.0.1+dfsg1-1.rbuild.log.gz

There is a regression with superlu 7.0.0. Tests build and link to MATGEN/libmatgen.so, e.g. for d_test

[ 98%] Linking C executable d_test
cd /projects/mathlibs/build/superlu/obj-x86_64-linux-gnu/TESTING && /usr/bin/cmake -E cmake_link_script CMakeFiles/d_test.dir/link.txt --verbose=1
/usr/bin/cc -DUSE_VENDOR_BLAS -g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/projects/mathlibs/build/superlu=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro CMakeFiles/s_test.dir/sdrive.c.o CMakeFiles/s_test.dir/sgst01.c.o CMakeFiles/s_test.dir/sgst02.c.o CMakeFiles/s_test.dir/sgst04.c.o CMakeFiles/s_test.dir/sgst07.c.o CMakeFiles/s_test.dir/sp_ienv.c.o CMakeFiles/s_test.dir/sp_sconvert.c.o -o s_test  MATGEN/libmatgen.so ../SRC/libsuperlu.so.7.0.0 -lblas -lm
/usr/bin/cc -DUSE_VENDOR_BLAS -g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/projects/mathlibs/build/superlu=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro CMakeFiles/d_test.dir/ddrive.c.o CMakeFiles/d_test.dir/dgst01.c.o CMakeFiles/d_test.dir/dgst02.c.o CMakeFiles/d_test.dir/dgst04.c.o CMakeFiles/d_test.dir/dgst07.c.o CMakeFiles/d_test.dir/sp_ienv.c.o CMakeFiles/d_test.dir/sp_dconvert.c.o -o d_test  MATGEN/libmatgen.so ../SRC/libsuperlu.so.7.0.0 -lblas -lm

but then fail to run

8: Test command: /projects/mathlibs/build/superlu/obj-x86_64-linux-gnu/TESTING/d_test "-t" "LA" "-n" "19" "-s" "2" "-l" "0"
8: Working Directory: /projects/mathlibs/build/superlu/obj-x86_64-linux-gnu/TESTING
8: Test timeout computed to be: 1500
 8/24 Test  #8: d_test_19_2_0_LA .................***Failed    0.01 sec
/projects/mathlibs/build/superlu/obj-x86_64-linux-gnu/TESTING/d_test: error while loading shared libraries: libmatgen.so: cannot open shared object file: No such file or directory

libmatgen.so is located in /projects/mathlibs/build/superlu/obj-x86_64-linux-gnu/TESTING/MATGEN/libmatgen.so It looks like ctest is failing to add TESTING/MATGEN to the LD_LIBRARY_PATH used in the test environment.

I haven't yet identified what changed between superlu 6.0.1 and 7.0.0 for superlu to lose the library path to TESTING/MATGEN. There were some test changes in MR #114, but it's not obvious if this is what's causing the problem.

drew-parsons commented 2 months ago

The general behaviour comes from cmake RPATH configuration. By default superlu sets CMAKE_SKIP_BUILD_RPATH=FALSE, so a RUNPATH is included in the library (and tests). The debian build aims to not use RUNPATH in the packaged libraries, so set CMAKE_SKIP_RPATH=ON. In superlu 7 this removes the RUNPATH from the test executables, causing the issue reported here.

So the debian build can be fixed by applying CMAKE_SKIP_INSTALL_RPATH=ON instead of CMAKE_SKIP_RPATH. This lets cmake add the RUNPATH into tests, so tests can run, while removing RUNPATH from the final libraries installed by the debian package.

That still leaves the question of why the behaviour changed between superlu 6.0.1 and 7.0.0.

gruenich commented 2 months ago

I could not find any changes regarding RPATH handling between 6.0.1 and 7.0.0.

Maybe I can reproduce your issue and bisect the commits to figure out what caused this behavioral change.

xiaoyeli commented 2 months ago

I don't understand the RPATH stuff. The RPATH handling part in CMakeLists.txt was added by someone else. It was added long time ago. As @gruenich pointed out, there is no change between 6.0.1 and 7.0.0

drew-parsons commented 2 months ago

I did a bit of bisection to confirm tests still pass before PR#114 and fail after i.e. passing (with CMAKE_SKIP_RPATH=ON) at https://github.com/xiaoyeli/superlu/tree/40ebe21cb5a1844f372bac8334243565c1da8d27 and failing (with CMAKE_SKIP_RPATH=ON) at https://github.com/xiaoyeli/superlu/tree/92b94b92cc789e3053f5dcfdc5db7c6c4bbad946

I guess the difference must be due to the changes made to the add_superlu_test() function.

Perhaps we don't need to worry so much about the details of the change, since we've got the practical solution of simply using CMAKE_SKIP_INSTALL_RPATH (not CMAKE_SKIP_RPATH) to control RUNPATH in the final library. Can close the issue in that case.

drew-parsons commented 2 months ago

I'm wondering if something changed in cmake itself with respect to how it handles CMAKE_SKIP_RPATH. There was a similar issue building ADIOS2. My cmake version is 3.30.3.

In any case using CMAKE_SKIP_INSTALL_RPATH resolves the problem. I think we can can close this issue.