Closed richc-admin-gcai closed 2 years ago
I believe this issues is in:
drmaa_utils/drmaa_utils/drmaa_run_bulk.c: while (argc >= 0 && argv[0][0] == '-') drmaa_utils/drmaa_utils/drmaa_run.c: while (argc >= 0 && argv[0][0] == '-')
Shouldn't this be:
while (argc > 0 && argv[0][0] == '-')
As if argc = 0, then referencing argv to check for '-' will cause a segfault.
If I make that change then the binaries throw the expected error:
[root@f8ddc11bc51e slurm-drmaa-1.1.2]# ./drmaa-run-bulk F #9472 [ 0.00] * syntax error F #9472 [ 0.00] | drmaa-run-bulk {start} {end} {step} {command}
[root@f8ddc11bc51e slurm-drmaa-1.1.2]# ./drmaa-run
F #9473 [ 0.00] * Failed to submit a job: drmaa_remote_command not set for job template
Your analysis looks correct to me, I'll commit a fix and include it in the next release of slurm-drmaa. Thanks!
Testing slurm drmaa in a container, but even when running outside of a container either building from source or installing via galaxy rpm every time I run binary its segfaults.
am I missing something?
Error is: [root@f8ddc11bc51e /]# DRMAA_LIBRARY_PATH=/usr/lib64/libdrmaa.so /usr/bin/drmaa-run Segmentation fault (core dumped)
Backtrace shows:
[root@f8ddc11bc51e /]# export DRMAA_LIBRARY_PATH=/usr/lib64/libdrmaa.so
[root@f8ddc11bc51e /]# gdb /usr/bin/drmaa-run GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/bin/drmaa-run...Reading symbols from /usr/lib/debug/usr/bin/drmaa-run.debug...done. done. (gdb) run Starting program: /usr/bin/drmaa-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault. 0x00000000004129b6 in parse_args (argc=0, argv=0x7fffffffe7a0) at drmaa_run.c:254 254 while (argc >= 0 && argv[0][0] == '-') (gdb) backtrace
0 0x00000000004129b6 in parse_args (argc=0, argv=0x7fffffffe7a0) at drmaa_run.c:254
1 0x00000000004120df in main (argc=1, argv=0x7fffffffe798) at drmaa_run.c:122
(gdb)
My test setup is as follows:
Dockerfile: $ cat Dockerfile FROM centos:7
RUN (cd /lib/systemd/system/sysinit.target.wants/; for i in ; do [ $i == systemd-tmpfiles-setup.service ] || rm -f $i; done); \ rm -f /lib/systemd/system/multi-user.target.wants/;\ rm -f /etc/systemd/system/.wants/;\ rm -f /lib/systemd/system/local-fs.target.wants/; \ rm -f /lib/systemd/system/sockets.target.wants/udev; \ rm -f /lib/systemd/system/sockets.target.wants/initctl; \ rm -f /lib/systemd/system/basic.target.wants/;\ rm -f /lib/systemd/system/anaconda.target.wants/*;
VOLUME [ "/sys/fs/cgroup"]
RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo RUN yum-config-manager --add-repo https://depot.galaxyproject.org/yum/galaxy.repo
RUN yum -y install which strace gdb RUN debuginfo-install -y libgcc-4.8.5-44.el7.x86_64 RUN debuginfo-install -y glibc-2.17-324.el7_9.x86_64 RUN yum -y install slurm-slurmd-20.11.8 slurm-devel-20.11.8glibc-2.17-324.el7_9.x86_64
RUN yum clean all && yum -y update
RUN yum -y install slurm-drmaa slurm-drmaa-debuginfo
RUN yum clean all && \ rm -rf /var/cache/yum
VOLUME [ "/sys/fs/cgroup"]
ENTRYPOINT ['/usr/sbin/init']
Which results in a working container, and when I login to the container I'm running:
[root@f8ddc11bc51e /]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core)
[root@f8ddc11bc51e7 /]# rpm -qa slurm*
slurm-slurmd-20.11.8-1.el7.x86_64 slurm-drmaa-debuginfo-1.1.2-1.el7.x86_64 slurm-20.11.8-1.el7.x86_64 slurm-devel-20.11.8-1.el7.x86_64 slurm-drmaa-1.1.2-1.el7.x86_64
[root@f8ddc11bc51e /]# yum info slurm-drmaa-1.1.2-1.el7.x86_64 Loaded plugins: fastestmirror, ovl Loading mirror speeds from cached hostfile