natefoo / slurm-drmaa

DRMAA for Slurm: Implementation of the DRMAA C bindings for Slurm
GNU General Public License v3.0
48 stars 22 forks source link

Submitting multi-threaded jobs using slurm-drmaa #2

Closed BrunoGrandePhD closed 6 years ago

BrunoGrandePhD commented 7 years ago

I realize it's strange to ask a question like this on a repository, but I've spent the past hour trying to figure it out on my own to no avail. I thought that you might be able to answer it in 30 seconds. I would greatly appreciate any help!

In essence, how do you submit multi-threaded jobs using slurm-drmaa? To be clear, I want the job to run on one node (i.e. --ntasks=1). I use the --cpus-per-task option with srun or sbatch, but this option isn't available in the native specification for slurm-drmaa.

I've tried different combinations of --mincpus, --nodes, --ntasks-per-node and --ntasks, but they either allow jobs to be split across multiple nodes or they fail. I've looked through the code for galaxyproject/galaxy and galaxyproject/pulsar, but I couldn't find any hints.

natefoo commented 7 years ago

I use --nodes=1 --ntasks=N, does this work for you?

BrunoGrandePhD commented 7 years ago

Unfortunately, those parameters don't prevent jobs from being split across multiple nodes, at least on our SLURM cluster. This is despite explicitly specifying --nodes=1. I'm not sure what to make of this. Do you have any ideas?

JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
32070    all run_test  bgrande  RUNNING       0:47 50-00:00:00      6 n[106,317-321]
32071    all run_test  bgrande  RUNNING       0:47 50-00:00:00      8 n[311-315,330-332]
32072    all run_test  bgrande  RUNNING       0:17 50-00:00:00     12 n[109,123,141,143,145,209,211,223,227,243-244,324]
32073    all run_test  bgrande  PENDING       0:00 50-00:00:00      1 (Priority)
32074    all run_test  bgrande  PENDING       0:00 50-00:00:00      1 (Priority)
32075    all run_test  bgrande  PENDING       0:00 50-00:00:00      1 (Priority)
32076    all run_test  bgrande  PENDING       0:00 50-00:00:00      1 (Priority)
32077    all run_test  bgrande  PENDING       0:00 50-00:00:00      1 (Priority)
32078    all run_test  bgrande  PENDING       0:00 50-00:00:00      1 (Priority)
32079    all run_test  bgrande  PENDING       0:00 50-00:00:00      1 (Priority)
kevins-repo commented 7 years ago

Hi, I found the same issue and then tried this drmaa supported option: --mincpus=\n Minimum number of logical processors (threads) per node as noted here: http://apps.man.poznan.pl/trac/slurm-drmaa/wiki/WikiStart#native_specification

This option will result in the following: -N 1 -n 1 --mincpus=4 --mem=16000

NumNodes=1 NumCPUs=4 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:: TRES=cpu=4,mem=16000M,node=1

Has there been any update or ideas to add support for --cpus-per-task? Thanks.

natefoo commented 7 years ago

I believe --cpus-per-task was added in 8acc159de4c5a73c5ebcac78078fb91e6c510a03, have you tested it? You can grab a development "release" tarball that includes it on the releases tab.

natefoo commented 6 years ago

@brunogrande I realized I never followed up on your question. I believe it works for me because I have MaxNodes=1 set on my partition.

natefoo commented 6 years ago

It should be possible to use --nodes=1-1 for this, but slurm-drmaa doesn't currently support it. I've created issue #4 to implement it.

natefoo commented 6 years ago

--nodes=1-1 works now (it was actually already implemented, but as --nodes=<minnodes[=maxnodes]>). I created a new release with the corrected delimiter.