natefoo / slurm-drmaa

DRMAA for Slurm: Implementation of the DRMAA C bindings for Slurm
GNU General Public License v3.0
48 stars 22 forks source link

segfault when submitting bulk jobs #5

Closed unode closed 6 years ago

unode commented 6 years ago

After:

export DRMAA_LIBRARY_PATH=~/test_drmaa/slurm-drmaa-1.2.0-dev.83fc288/slurm_drmaa/.libs/libdrmaa.so

When using libdrmaa via python

#!/usr/bin/env python
from __future__ import print_function
import os
import drmaa

LOGS = "logs/"
if not os.path.isdir(LOGS):
    os.mkdir(LOGS)

s = drmaa.Session()
s.initialize()
print("Supported contact strings:", s.contact)
print("Supported DRM systems:", s.drmsInfo)
print("Supported DRMAA implementations:", s.drmaaImplementation)
print("Version", s.version)

jt = s.createJobTemplate()
jt.remoteCommand = "/usr/bin/echo"
jt.args = ["Hello", "world"]
jt.jobName = "testdrmaa"
jt.jobEnvironment = os.environ.copy()
jt.workingDirectory = os.getcwd()

jt.outputPath = ":" + os.path.join(LOGS, "job-%A_%a.out")
jt.errorPath = ":" + os.path.join(LOGS, "job-%A_%a.err")
jt.nativeSpecification = "--cpus-per-task=2 --nodes=1 --mem-per-cpu=50 --partition=htc --tmp=100"

print("Submitting", jt.remoteCommand, "with", jt.args, "and logs to", jt.outputPath)
ids = s.runBulkJobs(jt, beginIndex=1, endIndex=2, step=1)
print("Job submitted with ids", ids)

s.deleteJobTemplate(jt)

The above code fails when calling runBulkJobs

Stack trace of the above script:

Program received signal SIGSEGV, Segmentation fault.
strlcpy (dest=dest@entry=0x7a9640 "9829091", src=0x0, size=size@entry=1024) at compat.c:50
50              while( *src  &&  --size > 0 )
(gdb) bt
#0  strlcpy (dest=dest@entry=0x7a9640 "9829091", src=0x0, size=size@entry=1024) at compat.c:50
#1  0x00007fffed772fac in drmaa_get_next_job_id (values=0x7ac5c0, value=0x7a9640 "9829091", value_len=1024) at drmaa_base.c:297
#2  0x00007fffeffed550 in ffi_call_unix64 () at /home/ilan/minonda/conda-bld/python_1494526091235/work/Python-3.6.1/Modules/_ctypes/libffi/src/x86/unix64.S:76
#3  0x00007fffeffeccf5 in ffi_call (cif=<optimized out>, fn=0x7fffed772e90 <drmaa_get_next_job_id>, rvalue=<optimized out>, avalue=0x7fffffffc6c0) at /home/ilan/minonda/conda-bld/python_1494526091235/work/Python-3.6.1/Modules/_ctypes/libffi/src/x86/ffi64.c:525
#4  0x00007fffeffe483c in _call_function_pointer (argcount=3, resmem=0x7fffffffc6f0, restype=<optimized out>, atypes=<optimized out>, avalues=0x7fffffffc6c0, pProc=0x7fffed772e90 <drmaa_get_next_job_id>, flags=4353) at /home/ilan/minonda/conda-bld/python_1494526091235/work/Python-3.6.1/Modules/_ctypes/callproc.c:809
#5  _ctypes_callproc (pProc=0x7fffed772e90 <drmaa_get_next_job_id>, argtuple=0x7fffffffc7e0, flags=4353, argtypes=<optimized out>, restype=0x7ffff0212f28, checker=0x0) at /home/ilan/minonda/conda-bld/python_1494526091235/work/Python-3.6.1/Modules/_ctypes/callproc.c:1147
#6  0x00007fffeffdcda3 in PyCFuncPtr_call (self=<optimized out>, inargs=<optimized out>, kwds=0x0) at /home/ilan/minonda/conda-bld/python_1494526091235/work/Python-3.6.1/Modules/_ctypes/_ctypes.c:3870
#7  0x00007ffff793fade in _PyObject_FastCallDict (func=0x7fffeea655c0, args=<optimized out>, nargs=<optimized out>, kwargs=0x0) at Objects/abstract.c:2316
#8  0x00007ffff7a1c2bb in call_function (pp_stack=0x7fffffffcb18, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4822
#9  0x00007ffff7a1f15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#10 0x00007ffff7969e33 in gen_send_ex (gen=0x7fffefd90200, arg=<optimized out>, exc=<optimized out>, closing=<optimized out>) at Objects/genobject.c:189
#11 0x00007ffff7978f3e in listextend (self=0x7fffeea79d48, b=<optimized out>) at Objects/listobject.c:857
#12 0x00007ffff7979398 in list_init (self=0x7fffeea79d48, args=<optimized out>, kw=<optimized out>) at Objects/listobject.c:2316
#13 0x00007ffff79add4c in type_call (type=<optimized out>, args=0x7ffff7e8d470, kwds=0x0) at Objects/typeobject.c:915
#14 0x00007ffff793fade in _PyObject_FastCallDict (func=0x7ffff7d5bb40 <PyList_Type>, args=<optimized out>, nargs=<optimized out>, kwargs=0x0) at Objects/abstract.c:2316
#15 0x00007ffff7a1c2bb in call_function (pp_stack=0x7fffffffce58, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4822
#16 0x00007ffff7a1f15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#17 0x00007ffff7a1aa60 in _PyEval_EvalCodeWithName (_co=0x7ffff01fc420, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=0x7ffff7e9dba0, kwargs=0x7ffff7f8fba8, kwcount=3, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff7ea3c30, qualname=0x7fffefd8d2b8) at Python/ceval.c:4128
#18 0x00007ffff7a1c48a in fast_function (kwnames=<optimized out>, nargs=1, stack=<optimized out>, func=0x7fffeea8c2f0) at Python/ceval.c:4939
#19 call_function (pp_stack=0x7fffffffd0f8, oparg=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:4819
#20 0x00007ffff7a1e8dd in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3300
#21 0x00007ffff7a1aa60 in _PyEval_EvalCodeWithName (_co=0x7ffff7f1b930, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kwnames=0x0, kwargs=0x8, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4128
#22 0x00007ffff7a1aee3 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4149
#23 0x00007ffff7a1af2b in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:695
#24 0x00007ffff7a4d6c0 in run_mod (arena=0x7ffff7f79180, flags=0x7fffffffd450, locals=0x7ffff7f5cf30, globals=0x7ffff7f5cf30, filename=0x7ffff7ea3830, mod=0x683f58) at Python/pythonrun.c:980
#25 PyRun_FileExFlags (fp=0x64cc30, filename_str=<optimized out>, start=<optimized out>, globals=0x7ffff7f5cf30, locals=0x7ffff7f5cf30, closeit=<optimized out>, flags=0x7fffffffd450) at Python/pythonrun.c:933
#26 0x00007ffff7a4ec83 in PyRun_SimpleFileExFlags (fp=0x64cc30, filename=<optimized out>, closeit=1, flags=0x7fffffffd450) at Python/pythonrun.c:396
#27 0x00007ffff7a6a0b5 in run_file (p_cf=0x7fffffffd450, filename=0x603310 L<error reading variable>, fp=0x64cc30) at Modules/main.c:338
#28 Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:810
#29 0x0000000000400c1d in main (argc=2, argv=<optimized out>) at ./Programs/python.c:69--

The above code runs fine with a libdrmaa built from https://github.com/ljyanesm/slurm-drmaa

unode commented 6 years ago

Confirming that the fix works. Thanks!