Closed mglev1n closed 1 year ago
Thank you so much for offering to write an LSF launcher! I just implemented a SLURM launcher, and I condensed the common elements of SGE and SLURM to make it easier to write new cluster launchers. If you have any questions, please let me know.
I added a script_directory
argument to provide the location of script paths. tools::R_user_dir(package = "crew.cluster", which = "cache")
seems like a good place to use in your case.
I'm working to create a minimal example LSF plugin for
crew.cluster
, building off of the current SGE implementation. One thing I've noticed in the current implementation is that the job submission scripts are written to a temporary directory: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/utils_names.R#L11I'd like to propose allowing the user to specify this directory, rather than making it temporary by default. At least in my current HPC environment, temporary directories are node specific, and are not necessarily accessible across machines/jobs. Following the example in https://github.com/wlandau/crew.cluster/blob/main/tests/sge/minimal.R, after
controller$push(...)
, it seems like the following chain of events should occur:launch_worker
submits a job, referencing the job submission script/parameters from above: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/crew_launcher_sge.R#L319In my environment, this workflow creates a situation where the job submission script could be written to a temporary directory on
Machine_A
, but thelaunch_worker
command is executed onMachine_B
where the submission script is not visible. This effectively means that no workers are able to start. This could be alleviated if the user is allowed to specify the temporary directory, which could point toward a shared directory visible across all machines/nodes. The default could remain saving to a temporary directory, maintaining the current functionality.Alternatively, allowing the user to specify arguments to the submission command here: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/crew_launcher_sge.R#LL338C4-L338C4 may work.