mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
170 stars 51 forks source link

[Pre-pull request] Adding jobs to an existing registry #209

Open cfhammill opened 5 years ago

cfhammill commented 5 years ago

Frequently I run into the situation that I run set of jobs in parallel with batchMap only to realize that I forgot to include an interesting case in the input list at a later date. Historically I've either made a new registry (ugh) or deleted and re-run everything (more-ugh). Today I really didn't want to do either so I figured out how to add jobs to an existing registry.

Is this something you'd consider adding if I put together a PR? I suspect the answer is probably "you should be using the experiment abstraction", but I suspect enough people run in to this problem that it would be beneficial to add. I've included code at the bottom for doing it very roughly, in the case it's going to be a part of the package I'd write something like batchUpdateMap which assembles the new param list for the user.

Example code for doing it manually below for if anyone needs it in the mean-time:

reg <- loadRegistry(reg, writeable = TRUE)

previous_max_id <- max(reg$status$job.id)
new_id <- previous_max_id + 1
new_params <- list(some = pars) #get skeleton from reg$defs$job.pars[[1]]

#Add row to job definitions
reg$defs <- 
  rbind(reg$defs
      , data.table(def.id = new_id
                 , job.pars = list(list(new_params)))

setkey(reg$defs, "def.id") #reset data.table key

#Add row to status table
reg$status <-
  rbind(reg$status
      , data.table(job.id = new_id, def.id = new_id, submitted = NA_real_, 
                   started = NA_real_, done = NA_real_, error = NA_character_, 
                   mem.used = NA_real_, resource.id = NA_integer_, batch.id = NA_character_, 
                   log.file = NA_character_, job.hash = NA_character_, job.name = NA_character_, 
                   key = "job.id"))

setkey(reg$status, "job.id") #reset data.table key

saveRegistry(reg) #Save our updates

Obviously this can be generalized for adding more than one job.

tdhock commented 5 years ago

that's funny I was doing pretty much the same hack yesterday

+1 for adding jobs to an existing registry

mllg commented 5 years ago

I can include something like this. How do you want the interface to look like? Re-running batchMap() or something like addJobs(params = list())?

cfhammill commented 5 years ago

I'd be interested in something along the lines of re-running batchMap, but with a different name e.g. batchMapAddition.

Originally I was thinking that the function should be required to use the same function as the original batchMap, but maybe that constraint isn't particularly useful.

cfhammill commented 5 years ago

Also, as mentioned in the title I'm happy to write it, but if you'd like more control over the implementation and want to write it yourself just let me know.

tdhock commented 5 years ago

my use case for adding jobs to an existing registry involves dependencies #204 between jobs that each have different functions, so it would be useful if each job could have its own function

mllg commented 5 years ago

my use case for adding jobs to an existing registry involves dependencies #204 between jobs that each have different functions, so it would be useful if each job could have its own function

Lifting all restrictions is probably better than only allowing to add more jobs for the same function. However, this requires extensive refactoring and is not easy to implement in a backward compatible fashion. I can give it a shot, but I'm currently quite busy with other projects, so this will probably not get done before January. 😞

If one of you guys want to start a PR, here are the most important steps to consider: