natefoo / slurm-drmaa

DRMAA for Slurm: Implementation of the DRMAA C bindings for Slurm
GNU General Public License v3.0
48 stars 22 forks source link

drmaa.errors.Conflicting in job submission #77

Open IBEXCluster opened 1 year ago

IBEXCluster commented 1 year ago

Dear team,

We have the following system environment:

$ cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) $ sbatch --version slurm 23.02.2 $ which sbatch /opt/slurm/cluster/ibex/install-v2/RedHat-7/bin/sbatch

We compiled the latest version of slurm-drmaa as follows:

$ git clone --recursive https://github.com/natefoo/slurm-drmaa.git $ cd slurm-drmaa/ $ autoconf $ ./autogen.sh --with-slurm-inc=/opt/slurm/cluster/ibex/install-v2/RedHat-7/include --with-slurm-lib=/opt/slurm/cluster/ibex/install-v2/RedHat-7/lib --prefix=/ibex/sw/csi/slurm-drmaa/1.2.0-dev/el7.9_gnu6.4.0/install $ ./configure --prefix=/ibex/sw/csi/slurm-drmaa/1.2.0-dev/el7.9_gnu6.4.0/install --with-slurm-inc=/opt/slurm/cluster/ibex/install-v2/RedHat-7/include --with-slurm-lib=/opt/slurm/cluster/ibex/install-v2/RedHat-7/lib $ make $ make install $ pip install drmaa

We defined the following environment variables:

export DRMAA_LIBRARY_PATH=/ibex/sw/csi/slurm-drmaa/1.2.0-dev/el7.9_gnu6.4.0/install/lib/libdrmaa.so.1 export PATH=/ibex/sw/csi/slurm-drmaa/1.2.0-dev/el7.9_gnu6.4.0/install/bin:$PATH export LD_LIBRARY_PATH=/ibex/sw/csi/slurm-drmaa/1.2.0-dev/el7.9_gnu6.4.0/install/lib:$LD_LIBRARY_PATH

When I'm trying DRMAA python binding to submit a job, it's failed and here are the summary of error:

Creating job template Traceback (most recent call last): File "runme.py", line 23, in main() File "runme.py", line 16, in main jobid = s.runJob(jt) File "/ibex/sw/csi/slurm-drmaa/1.2.0-dev/el7.9_gnu6.4.0/miniconda2/lib/python2.7/site-packages/drmaa/session.py", line 314, in runJob c(drmaa_run_job, jid, sizeof(jid), jobTemplate) File "/ibex/sw/csi/slurm-drmaa/1.2.0-dev/el7.9_gnu6.4.0/miniconda2/lib/python2.7/site-packages/drmaa/helpers.py", line 302, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "/ibex/sw/csi/slurm-drmaa/1.2.0-dev/el7.9_gnu6.4.0/miniconda2/lib/python2.7/site-packages/drmaa/errors.py", line 151, in error_check raise _ERRORScode - 1 drmaa.errors.ConflictingAttributeValuesException: code 15: drmaa_join_files is set and output file is not given

Here is the Python code:

cat runme.py

!/usr/bin/env python

import drmaa import os

def main(): """ Submit a job. Note, need file called sleeper.sh in current directory. """ with drmaa.Session() as s: print('Creating job template') jt = s.createJobTemplate() jt.remoteCommand = os.path.join(os.getcwd(), 'sleeper.sh') jt.args = ['42', 'Simon says:'] jt.joinFiles=True

   jobid = s.runJob(jt)
   print('Your job has been submitted with ID %s' % jobid)

   print('Cleaning up')
   s.deleteJobTemplate(jt)

if name=='main': main()

Please advise me the solution. Are we missing something? Your suggestion is much appreciated!

Thanks and Regards, Naga

IBEXCluster commented 1 year ago

Dear all, We have some workaround:

export DRMAA_LIBRARY_PATH=lib.slurm-22.05.6/libdrmaa.so.1
export LD_LIBRARY_PATH=lib.slurm-22.05.6:$LD_LIBRARY_PATH

This solution, helped us for launching job using Slurm-DRMAA!

IBEXCluster commented 1 year ago

Thanks to @wickhagj for fixing the bug in the Slurm-drmaa (https://github.com/natefoo/slurm-drmaa/commit/1f5db98cd788677e7a94b93cb56bc00266539fe2)

The modifications are included in the master repo.

natefoo commented 10 months ago

If you define a job output file (jt.outputPath = "/foo"), does this avoid the error?