mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
170 stars 51 forks source link

is is possible to use job arrays and chunks at the same time? #210

Open tdhock opened 5 years ago

tdhock commented 5 years ago

If I understand correctly the default in batchtools is to not use job arrays. why is that? that seems strange to me, as it is a lot faster on SLURM to call sbatch once with a job array (with say 100 elements) than it is to call sbatch 100 times.

So say I have 10 jobs. I would like to put the first five jobs in one task and the second five jobs in the second task, by running sbatch ONCE as a job array (with --array=1:2) which will start two tasks, each running (sequentially) five jobs. Is that possible?

It seems to me, after reading the docs of submitJobs, that the only way to use job arrays is by passing a data.frame with a chunk column, and specifying chunks.as.arrayjobs=TRUE in resources. Is that right? Or is there another way to tell batchtools to use job arrays?

When I do submitJobs(data.table(job=1:10, chunk=c(1,2)), it run sbatch TWICE, each time with --array=1:5, generating 10 tasks total, which is not what I want.

When I do submitJobs(data.table(job=1:10, chunk=1:5) it runs sbatch FIVE times, each time with --array=1:2, generating 10 tasks total, which is not what I want.

mllg commented 5 years ago

If I understand correctly the default in batchtools is to not use job arrays. why is that? that seems strange to me, as it is a lot faster on SLURM to call sbatch once with a job array (with say 100 elements) than it is to call sbatch 100 times.

Not all scheduler / scheduler installations support job arrays.

So say I have 10 jobs. I would like to put the first five jobs in one task and the second five jobs in the second task, by running sbatch ONCE as a job array (with --array=1:2) which will start two tasks, each running (sequentially) five jobs. Is that possible?

This is currently not possible. You can either manually chunk or use job arrays. I get your point that this is not efficient. This was probably a bad design decision which dates back to BatchJobs.

If I find some spare time, I will look into supporting chunks and arrays simultaneously. However, as I have limited access to systems with job arrays, this unfortunately is not likely to happen very soon without any support.

tdhock commented 5 years ago

Too bad that I would have to chunk myself. I guess that is possible though.

I think that would be great if you could support that in the future.

For testing and development, I am using Ubuntu, where I installed SLURM via a simple command line apt-get install slurm-llnl. To run all jobs (including job arrays) on my laptop or Travis VM (no multi-computer cluster) I use the config file https://github.com/tdhock/PeakSegPipeline/blob/batchtools/slurm.conf (needs to be copied to /etc/slurm-llnl/slurm.conf)

dankessler commented 4 years ago

I'm also interested in being able to use chunking (which is super helpful when each of my experiments are fast running) but also job arrays (which allows for much faster interaction with the scheduler I use: Slurm).

Before I take a stab at implementing this, I wanted to propose a design from the user's perspective.

It seems to me like a natural way to specify this is to provide another resource, say array.

Suppose we submit something like this (I've deliberately done some things in this to illustrate how it would handle weird user behavior).

| jobid | chunk | array |
|-------+-------+-------|
|     1 |     1 |     1 |
|     2 |     1 |     1 |
|     3 |     1 |     1 |
|     4 |     2 |     1 |
|     5 |     2 |     1 |
|     6 |     3 |     2 |
|     7 |     3 |     2 |
|     8 |     3 |     2 |
|     9 |     1 |     3 |
|    10 |     2 |     3 |

and we do not specify chunks.as.arrayjobs=TRUE (in which case it defaults to FALSE).

batchtools would then construct three array jobs (i.e., it would interact with the scheduler three times).

The idea is that it first divides the jobs into arrays, and then into chunks. This way, chunks do not have to be unique across arrays (e.g., jobs 1 and 9 are both chunk 1 but arrays 1 and 3 respectively, so you can think of them more like 1:1 and 3:1).

Job Array I This would be a length-2 job array which would process

Job Array II This would be a length-1 (trivial) job array which would process

Job Array III This would be a length-2 job array which would process

Does this seem like a sensible design? If users did not specify chunks at all, then I suppose it would essentially treat each one as it if were an atomic chunk, i.e., if we submitted

| jobid | array |
|-------+-------|
|     1 |     1 |
|     2 |     1 |
|     3 |     2 |
|     4 |     2 |
|     5 |     3 |

this would spawn still construct three array jobs.

Job Array I This would be a length-2 job array, where the elements of the array are

Job Array II A length-2 job array, where the elements of the array are

Job Array III A length-1 (trivial) job array, with element

Problems The most obvious wrinkle to me is what happens if (a) a user specifies chunks.as.arrayjobs=TRUE and also uses the proposed approach, or (b) a user specifies chunks.as.arrayjobs=TRUE at a per-job level, such that it varies even within a chunk (I'm not sure how batchtools presently handles this).

dankessler commented 4 years ago

Ah, it looks like batchtools yells at the user if they are doing per-job resources AND chunking (really, if the resources vary within a chunk) which I think rules out the problems I identified.

https://github.com/mllg/batchtools/blob/b54f00b0f7a413272da44f4217a6340aedc7cf8c/R/submitJobs.R#L259