mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
172 stars 51 forks source link

any plans to support dependencies between jobs? #204

Open tdhock opened 6 years ago

tdhock commented 6 years ago

Hi I'm interested in using batchtools but after looking at the documentation I'm not sure if batchtools has support for dependencies between jobs, which is a key feature that I would need. It is documented for SLURM on https://slurm.schedmd.com/job_array.html

e.g

# Wait for entire job array to complete successfully
sbatch --depend=afterok:123 my.job

If batchtools does support dependencies, where are the docs?

If not, how hard would it be to implement?

tdhock commented 6 years ago

hello @berndbischl @arfon @timflutre @mllg

berndbischl commented 6 years ago

well @mllg really should answer here..... but my 2cents:

a) no this is is not supported, maybe you can hack something in, but it its supported in a cool an general way b) it was one of the first general big issues i opened up for batchjobs quite some time ago. this is something that would really take bt to the next level IMHO

but stuff like that is usually not that simple to implement

berndbischl commented 6 years ago

if some of us are here, can we maybe at least, before we jump to solution specify what we want? how would a cool system for this look like?

mllg commented 6 years ago

As @berndbischl said, it is not yet supported. A simple version would not be too hard to implement. It all depends on the interface you need. What would be relatively easy to write is the following:

  1. You define jobs as usual with batchMap().
  2. Get the table of all jobs you want to submit, e.g. ids = findNotSubmitted().
  3. Add an integer column depends.on. This is either NA (no deps) or a valid job id. Send to submitJobs().
  4. submitJobs() needs to first submit all jobs with depends.on == NA. Wait until all these jobs have been submitted to Slurm, as you need the slurm job id as returned by sbatch in the database.
  5. Adjust the resources to add "depend=afterok:xx" and submit all jobs whose dependencies are already submitted. Repeat until all jobs submitted.
berndbischl commented 6 years ago

what it make sense - at some point, as this would be more complicated i guess - to look at a combo with drake?

tdhock commented 5 years ago

hi @mllg thanks for the idea to use depend=afterok:xx in resources. in fact I could probably do this in the current version of batchtools, as long as I use one register per step, right?

reg1=makeRegistry("~/registry/1")
reg2=makeRegistry("~/registry/2")
batchMap(fun = Step1, 1:10, reg=reg1)
batchMap(fun = Step2, "FOO", reg=reg2)
jobs <- getJobTable(reg=reg1)
chunks <- data.table(jobs, chunk=1)
submitJobs(chunks, resources = list(
  walltime = 3600, memory = 1024, ncpus=1, ntasks=1,
  chunks.as.arrayjobs=TRUE),
  reg=reg1)
jobs.done <- getJobTable()
job.id <- sub("_.*", "", jobs.done$batch.id)[[1]]
submitJobs(resources = list(
  walltime = 3600, memory = 1024, ncpus=1, ntasks=1,
  afterok=job.id
), reg=reg2)

I added the following line to slurm-simple.tmpl:

<%= if (!is.null(resources$afterok)) paste0("#SBATCH --depend=afterok:", resources$afterok) %>

Do you think that is an OK approach?

For me it seems a bit cumbersome to have to create one registry per step...

mschubert commented 5 years ago

Wouldn't it make more sense to explicitly not try this in batchtools and use a workflow tool for job dependencies, like drake?

mllg commented 5 years ago

Wouldn't it make more sense to explicitly not try this in batchtools and use a workflow tool for job dependencies, like drake?

For more complex scenarios and to ensure portability between batch systems: yes. But as outlined above, it is note that difficult to implement. You just need a topo-sort, e.g. from https://github.com/mlr-org/mlr3misc/blob/master/R/topo_sort.R.