natefoo / slurm-drmaa

DRMAA for Slurm: Implementation of the DRMAA C bindings for Slurm
GNU General Public License v3.0
48 stars 22 forks source link

Compatibility with slurm 19.05.01 #28

Closed dpryan79 closed 4 years ago

dpryan79 commented 5 years ago

At least when I built it just now, slurm 19.05.01 doesn't ship a libslurmdb.so, so the linking test run by configure fails. Removing the linking to it here solves the issue https://github.com/natefoo/slurm-drmaa/blob/master/m4/ax_slurm.m4#L75 but it'd be good if someone else confirmed before this is implemented.

EricR86 commented 5 years ago

I can confirm that the slurmdb is merged in with the slurm library. There is a note from Schedmd:

NOTE: libslurmdb has been merged into libslurm.  If functionality is needed
      from libslurmdb please just link to libslurm.

It is under the "28 May 2019" section of the RELEASE_NOTES and can be found on the website

ocfmatt commented 4 years ago

I have a compilation issue whereby I encounter the following error:

checking for usable SLURM libraries/headers... *** The SLURM test program failed to link or run. See the file config.log
*** for the exact error that occured.
no
configure: error:
Slurm libraries/headers not found;
add --with-slurm-inc and --with-slurm-lib with appropriate locations.

My configure command is: ./configure --prefix=/opt/software/galaxy/plugins/drmaa --with-slurm-inc=/opt/software/slurm/19.05.1-2/include --with-slurm-lib=/opt/software/slurm/19.05.1-2/lib

Libraries and includes are present as stated in the help text:

[root@maxlogin1 slurm-drmaa]# ls -la /opt/software/slurm/19.05.1-2/include/slurm/slurm.h /opt/software/slurm/19.05.1-2/lib/libslurm.a
-rw-r--r--. 1 root root   217039 Aug  5 11:24 /opt/software/slurm/19.05.1-2/include/slurm/slurm.h
-rw-r--r--. 1 root root 55722378 Aug  5 11:22 /opt/software/slurm/19.05.1-2/lib/libslurm.a

My Slurm installation is compiled from source using Intel compilers. I am getting the same error when running against Slurm 18.08.8 making me think I am potentially missing a compilation flag in either Slurm or drmaa.

@dpryan79 did you install Slurm 19.05.1 from packages or compile from source?

EricR86 commented 4 years ago

@ocfmatt if you look in your config.log I'm pretty certain the error will show up as /usr/bin/ld: cannot find -lslurmdb after configure tries to compile a test program. It's complaining specifically that it can't find libslurmdb.so because that file no longer exists in Slurm 19.05.

You can remove the -lslurmdb line safely in either the fix shown above or from the SLURM_LIBS variable inside configure itself (it was around line 14034 for me).

ocfmatt commented 4 years ago

@EricR86 Thanks for your quick reply.

I removed "-lslurmdb" and was able to configure and make after I downloaded the release source code. I had an uninitialised drmaa_utils path causing other errors.

research-computing-facility commented 4 years ago

Hi all, I have been able to build the drmaa thanks to these tips with slurm 19.05.4 but the module will not load:

galaxy.jobs.runners.drmaa INFO 2019-11-21 17:13:08,365 Overriding DRMAA_LIBRARY_PATH due to runner plugin parameter: /galaxy/slurm-drmaa/compiled/lib/libdrmaa.so
Traceback (most recent call last):
  File "/galaxy/production/lib/galaxy/webapps/galaxy/buildapp.py", line 58, in paste_app_factory
    app = galaxy.app.UniverseApplication(global_conf=global_conf, **kwargs)
  File "/galaxy/production/lib/galaxy/app.py", line 189, in __init__
    self.job_manager = manager.JobManager(self)
  File "/galaxy/production/lib/galaxy/jobs/manager.py", line 24, in __init__
    self.job_handler = handler.JobHandler(app)
  File "/galaxy/production/lib/galaxy/jobs/handler.py", line 34, in __init__
    self.dispatcher = DefaultJobDispatcher(app)
  File "/galaxy/production/lib/galaxy/jobs/handler.py", line 779, in __init__
    self.job_runners = self.app.job_config.get_job_runner_plugins(self.app.config.server_name)
  File "/galaxy/production/lib/galaxy/jobs/__init__.py", line 649, in get_job_runner_plugins
    rval[id] = runner_class(self.app, runner['workers'], **runner.get('kwds', {}))
  File "/galaxy/production/lib/galaxy/jobs/runners/drmaa.py", line 63, in __init__
    drmaa = __import__("drmaa")
  File "/galaxy/production/.venv/lib/python2.7/site-packages/drmaa/__init__.py", line 65, in <module>
    from .session import JobInfo, JobTemplate, Session
  File "/galaxy/production/.venv/lib/python2.7/site-packages/drmaa/session.py", line 39, in <module>
    from drmaa.helpers import (adapt_rusage, Attribute, attribute_names_iterator,
  File "/galaxy/production/.venv/lib/python2.7/site-packages/drmaa/helpers.py", line 36, in <module>
    from drmaa.wrappers import (drmaa_attr_names_t, drmaa_attr_values_t,
  File "/galaxy/production/.venv/lib/python2.7/site-packages/drmaa/wrappers.py", line 56, in <module>
    _lib = CDLL(libpath, mode=RTLD_GLOBAL)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /galaxy/slurm-drmaa/compiled/lib/libdrmaa.so: undefined symbol: slurm_kill_job2

What is odd is that it seems to be defined in the headers;

==> grep -r slurm_kill_job2 /usr/include/slurm
/usr/include/slurm/slurm.h: * slurm_kill_job2()
/usr/include/slurm/slurm.h:extern int slurm_kill_job2(const char *job_id, uint16_t signal, uint16_t flags);
==> ldd /galaxy/slurm-drmaa/compiled/lib/libdrmaa.so
    linux-vdso.so.1 =>  (0x00007ffef55c8000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb11604c000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fb115c7e000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fb11648c000)

Command used to configure: ./configure --prefix /galaxy/slurm-drmaa/compiled Would anyone have any suggestions? Many thanks in advance

dpryan79 commented 4 years ago

libdrmaa should be linking against libslurm.so, but that doesn't seem to be the case for you, which I think is the cause of the problem.

research-computing-facility commented 4 years ago

Hi @dpryan79 you were right I did a silly mistake I removed the whole line in the configure rather than leaving SLURM_LIBS="-lslurm " thanks for your insight!

natefoo commented 4 years ago

Fixed by @EricR86 in #34 and released in version 1.1.1. Thanks!

X-WJ commented 3 years ago

Hi, I also met this problem as natefoo described .and i revised this line in configure like this .

line 14022 SLURM_LIBS="-lslurm"