natefoo / slurm-drmaa

DRMAA for Slurm: Implementation of the DRMAA C bindings for Slurm
GNU General Public License v3.0
48 stars 22 forks source link

LD_LIBRARY_PATH not exported. #44

Closed reid-wagner closed 2 years ago

reid-wagner commented 3 years ago

Hi,

I am experiencing an issue similar to #19. It appears the slurm shared libraries specified by --with-slurm-lib cannot be found when loading conftest at runtime during the configure script.

I believe the issue is that while LD_LIBRARY_PATH is set in ax_slurm.m4, it is never exported. You can see how this was done in cURL: https://github.com/curl/curl/commit/302d537423c0bf2429bd6691a775319c0dec0c10.

I've tried this and it does fix the issue. Alternatively, while looking into this I've found it suggested that using rpath is better practice as it's more constrained. I also was able to run ./configure successfully by setting the rpath as shown here: https://github.com/reid-wagner/slurm-drmaa/commit/67c7f6e2edd751b287332fa6fef07adb60635ff6.

If you want to go that path I'd be glad to open a PR. I haven't been able to test compilation yet for a few reasons, one being that I'm encountering an unrelated compilation issue on master.

The above issue happens with slurm-drmaa 1.1.1 and gcc 4.8.5 on CentOS 7.8.2003.

Additionally it's worth mentioning that out of the box 1.1.1 configured and compiled on my Ubuntu machine with gcc 9.3.0. I actually grabbed the conftest.c source from config.log and compiled it on both machines. On the Ubuntu machine it appears that the dependency on libslurm was stripped from the ELF, I guess because it's optimized out. On the CentOS machine the dependency is there.. So on the Ubuntu machine it wasn't actually testing that the libraries could be found at runtime.

Thanks for taking a look.

Below is the error from config.log. I modified the paths:


configure:14098: checking for usable SLURM libraries/headers
configure:14119: gcc -std=gnu99 -o conftest -pedantic -std=c99 -g -O2 -pthread -D_REENTRANT -D_THREAD_SAFE -DNDEBUG  -D_GNU_SOURCE -I
/path/to/include/  -L/path/to/lib/ conftest.c -lslurm   -lslurm  >&5
configure:14119: $? = 0
configure:14119: ./conftest
./conftest: error while loading shared libraries: libslurm.so.35: cannot open shared object file: No such file or directory
configure:14119: $? = 127
configure: program exited with status 127
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "DRMAA for Slurm"
| #define PACKAGE_TARNAME "slurm-drmaa"
| #define PACKAGE_VERSION "1.1.1"
| #define PACKAGE_STRING "DRMAA for Slurm 1.1.1"
| #define PACKAGE_BUGREPORT "nate@bx.psu.edu"
| #define PACKAGE_URL ""
| #define PACKAGE "slurm-drmaa"
| #define VERSION "1.1.1"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define HAVE_DLFCN_H 1
| #define LT_OBJDIR ".libs/"
| #define HAVE_PTHREAD_PRIO_INHERIT 1
| #define HAVE_LIBSLURM 1
| /* end confdefs.h.  */
|  #include "slurm/slurm.h"
| int
| main ()
| {
|  job_desc_msg_t job_req; /*at least check for declared structs */
|                  return 0;
| 
|   ;
|   return 0;
| }
configure:14134: result: no
configure:14140: error: 
Slurm libraries/headers not found;
add --with-slurm-inc and --with-slurm-lib with appropriate locations.
natefoo commented 3 years ago

I set this locally when running ./configure with LDFLAGS=-Wl,-rpath=/path/to/dir/containing/libslurm.so ./configure ... but if it's become standard practice to set a lib's rpath in the autoconf script I can look in to that - thanks for providing the examples.

natefoo commented 2 years ago

I'm trying to get a better understanding of what conditions this occurs under. For example, using a hand-built Slurm in my home directory on Ubuntu 21.04, with the autotools scripts generated on that platform, libtool correctly sets the rpath when compiling:

nate@pdp-11% ./configure --prefix=/tmp/slurm-drmaa --with-slurm-inc=/home/nate/work/slurm/20.11.8/include --with-slurm-lib=/home/nate/work/slurm/20.11.8/lib
...
nate@pdp-11% make
...
/bin/bash ../libtool  --tag=CC   --mode=link gcc  -Wall -W -Wno-unused-parameter -Wno-format-zero-length -pedantic -std=c99 -g -O2 -pthread -L/home/nate/work/slurm/20.11.8/lib -version-info 1:8:0  -o libdrmaa.la -rpath /home/nate/work/slurm/20.11.8/lib libdrmaa_la-drmaa.lo libdrmaa_la-job.lo libdrmaa_la-session.lo libdrmaa_la-util.lo ../drmaa_utils/drmaa_utils/libdrmaa_utils.la -lslurm
libtool: link: gcc -shared  -fPIC -DPIC  .libs/libdrmaa_la-drmaa.o .libs/libdrmaa_la-job.o .libs/libdrmaa_la-session.o .libs/libdrmaa_la-util.o  -Wl,--whole-archive ../drmaa_utils/drmaa_utils/.libs/libdrmaa_utils.a -Wl,--no-whole-archive  -Wl,-rpath -Wl,/home/nate/work/slurm/20.11.8/lib -Wl,-rpath -Wl,/home/nate/work/slurm/20.11.8/lib -L/home/nate/work/slurm/20.11.8/lib /home/nate/work/slurm/20.11.8/lib/libslurm.so  -g -O2 -pthread   -pthread -Wl,-soname -Wl,libdrmaa.so.1 -o .libs/libdrmaa.so.1.0.8

And the resulting libdrmaa.so indeed has a proper rpath:

nate@pdp-11% objdump -p /tmp/slurm-drmaa/lib/libdrmaa.so.1.0.8
Dynamic Section:
  NEEDED               libslurm.so.36
  NEEDED               libpthread.so.0
  NEEDED               libc.so.6
  SONAME               libdrmaa.so.1
  RUNPATH              /home/nate/work/slurm/20.11.8/lib
  INIT                 0x0000000000006000
  FINI                 0x000000000001dd34
  INIT_ARRAY           0x00000000000268d0
  INIT_ARRAYSZ         0x0000000000000008
  FINI_ARRAY           0x00000000000268d8
  FINI_ARRAYSZ         0x0000000000000008
  GNU_HASH             0x00000000000002f0
  STRTAB               0x0000000000002498
  SYMTAB               0x0000000000000860
  STRSZ                0x0000000000001382
  SYMENT               0x0000000000000018
  PLTGOT               0x0000000000027000
  PLTRELSZ             0x00000000000014b8
  PLTREL               0x0000000000000007
  JMPREL               0x0000000000004408
  RELA                 0x0000000000003b38
  RELASZ               0x00000000000008d0
  RELAENT              0x0000000000000018
  VERNEED              0x0000000000003a78
  VERNEEDNUM           0x0000000000000002
  VERSYM               0x000000000000381a
  RELACOUNT            0x0000000000000035

nate@pdp-11% ldd /tmp/slurm-drmaa/lib/libdrmaa.so.1.0.8
    linux-vdso.so.1 (0x00007ffd415eb000)
    libslurm.so.36 => /home/nate/work/slurm/20.11.8/lib/libslurm.so.36 (0x00007f25ae090000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f25ae05d000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f25ade71000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f25ade6a000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f25add1c000)
    libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f25adcff000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f25ae27f000)

However, using the configure script generated by the same Ubuntu 21.04 autotools but called on CentOS 7 or 8, you can't even get past the configure stage because the test program isn't linked with an rpath:

configure:14117: gcc -o conftest -pedantic -std=c99 -g -O2 -pthread -D_REENTRANT -D_THREAD_SAFE -DNDEBUG  -D_GNU_SOURCE -I/home/build/sl
urm/include  -L/home/build/slurm/lib conftest.c -lslurm   -lslurm  >&5
configure:14117: $? = 0
configure:14117: ./conftest
./conftest: error while loading shared libraries: libslurm.so.36: cannot open shared object file: No such file or directory

And yet there is nothing setting the runtime linker path in the call, even when running the configure script on Ubuntu:

configure:14117: gcc -o conftest -pedantic -std=c99 -g -O2 -pthread -D_REENTRANT -D_THREAD_SAFE -DNDEBUG  -D_GNU_SOURCE -I/home/nate/work/slurm/20.11.8/include  -L/home/nate/work/slurm/20.11.8/lib conftest.c -lslurm   -lslurm  >&5

But Ubuntu's configtest somehow has the rpath compiled in. So I am trying to figure out where this difference comes from before we commit a solution.

natefoo commented 2 years ago

Ok, I think I've figured out what's up, the order of things in #34 is a bit off and I think should be fixed in #62.

Your sleuthing uncovering the check in conftest being optimized out was very helpful, thanks!