BatchJobs should allow to source scripts on the nodes

mschubert commented 10 years ago

It is possible to define packages to be loaded on the nodes, but not to source() scripts. This is valid when the function I call resides in another file that makes use of helper functions there.

Consider the following example (working on interactive, not working on LSF):

caller.r

library(BatchJobs)
source('callee.r')
reg = batchMapQuick(primary.func, c(1,2), temporary=F)

callee.r

myglobal <<- "123"
primary.func = function(val) {
    print(myglobal) # fails
    secondary.func(val) # fails
}
secondary.func = function(val) {
}

The workaround I'm currently using is to source("callee.r") in primary.func of callee.r. This is not only ugly but also dangerous because of a possible infinite recursion.

I think the nicest way to handle this would be to add an option to source on the node (analogous to loading packages).

berndbischl commented 10 years ago

1) We agree here. We have talked about this a couple of times. You want to be able to a) export R objects (data and variables) to the nodes b) read some library code from R source scripts right?

2) We have this

a) Exporting: Please consider that you can always use the more.args argument of batchMap. Yes, I know in some cases (many global constant values one needs in jobs) this can be tedious.

Michel worked on a better exporting option, but I guess this is not already finished? Currently I only see this http://tudo-r.github.io/BatchJobs/man/loadExports.html

b) Sourcing: Did you see the src.dirs option in makeRegistry? http://tudo-r.github.io/BatchJobs/man/makeRegistry.html

src.dirs [character]

Directories relative to your work.dir containing R scripts to be sourced on registry load (both on slave and master). Files not matching the pattern “.[Rr]$” are ignored. Useful if you have many helper functions that are needed during the execution of your jobs. These files should only contain function definitions and no executable code. Default is character(0).

berndbischl commented 10 years ago

Also see discussion here:

https://github.com/tudo-r/BatchJobs/issues/15

mschubert commented 10 years ago

I'm mainly interested in (b) because the way our cluster is set up the nodes will always have access to the master file system.

Thanks for pointing me to the src.dirs option, I missed that so far.

However, I'm not quite happy with it sourcing directories, because I could never have my master script (executable) and my node script (that contains 3 functions) in one directory. For now, I'll add a function setupAndRun() in my node script that sources everything I need.

berndbischl commented 10 years ago

I'm mainly interested in (2) because the way our cluster is set up the nodes will always have access to the master file system.

Well, if you do not have a shared filesystem between master and nodes, you would be out of luck w.r.t. BatchJobs currently. This is a very big and ugly TODO on our list.

Thanks for pointing me to the src.dirs option, I missed that so far.

However, I'm not quite happy with it sourcing directories, because I could never have my master script (executable) and my node script (that contains 3 functions) in one directory. For now, I'll add a function setupAndRun() in my node script that sources everything I need.

Argh, yeah this is kind of a bad design on my part. I will probably change this to allow individual R source files. I wanted to make this easier for users (like myself) who put such helpers sources into specific "lib" subfolders, which contain only function definitions. But I guess we all work differently.

So, if you could pass a vector of relative paths (to the work.dir) that contain .R files and these get sourced on the job you would be happy?

mschubert commented 10 years ago

So, if you could pass a vector of relative paths (to the work.dir) that contain .R files and these get sourced on the job you would be happy?

If I could pass a vector of relative file paths (also to the batchMapQuick function) that would be great!

berndbischl commented 10 years ago

If I could pass a vector of relative file paths (also to the batchMapQuick function) that would be great!

Ok. I will create a new issue for this. But note that if you have such a complicated setup my gut feeling is that you should not use batchMapQuick anymore, but makeRegistry / batchMap instead.

mllg commented 10 years ago

See #17.

tudo-r / BatchJobs

BatchJobs should allow to source scripts on the nodes #13