wlandau / crew.cluster

crew launcher plugins for traditional high-performance computing clusters
https://wlandau.github.io/crew.cluster
Other
26 stars 9 forks source link

Specify temporary directory for job submission scripts #3

Closed mglev1n closed 1 year ago

mglev1n commented 1 year ago

I'm working to create a minimal example LSF plugin for crew.cluster, building off of the current SGE implementation. One thing I've noticed in the current implementation is that the job submission scripts are written to a temporary directory: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/utils_names.R#L11

I'd like to propose allowing the user to specify this directory, rather than making it temporary by default. At least in my current HPC environment, temporary directories are node specific, and are not necessarily accessible across machines/jobs. Following the example in https://github.com/wlandau/crew.cluster/blob/main/tests/sge/minimal.R, after controller$push(...), it seems like the following chain of events should occur:

  1. A job submission script is written to a temporary directory, which when run will launch a worker
  2. launch_worker submits a job, referencing the job submission script/parameters from above: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/crew_launcher_sge.R#L319
  3. Once the worker job is running, it will accept commands

In my environment, this workflow creates a situation where the job submission script could be written to a temporary directory on Machine_A, but the launch_worker command is executed on Machine_B where the submission script is not visible. This effectively means that no workers are able to start. This could be alleviated if the user is allowed to specify the temporary directory, which could point toward a shared directory visible across all machines/nodes. The default could remain saving to a temporary directory, maintaining the current functionality.

Alternatively, allowing the user to specify arguments to the submission command here: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/crew_launcher_sge.R#LL338C4-L338C4 may work.

wlandau commented 1 year ago

Thank you so much for offering to write an LSF launcher! I just implemented a SLURM launcher, and I condensed the common elements of SGE and SLURM to make it easier to write new cluster launchers. If you have any questions, please let me know.

I added a script_directory argument to provide the location of script paths. tools::R_user_dir(package = "crew.cluster", which = "cache") seems like a good place to use in your case.