wlandau / crew.cluster

crew launcher plugins for traditional high-performance computing clusters
https://wlandau.github.io/crew.cluster
Other
27 stars 9 forks source link

Support LSF #4

Closed wlandau closed 1 year ago

wlandau commented 1 year ago

@mglev1n, from your comment in #3: are you working on an LSF plugin just for yourself, or are you planning to contribute to crew.cluster? In the former case, I will await your PR. Otherwise, I will create one and share with you for testing.

mglev1n commented 1 year ago

I can take a stab at contributing. An initial issue that I've noticed is with the current generic cluster worker termination command:

https://github.com/wlandau/crew.cluster/blob/5bf0dbfbae983e8257dee2df7d8466053294ea76/R/crew_launcher_cluster.R#L243-L249

The LSF scheduler uses the bkill command to terminate jobs, but by default expects a numeric job ID (assigned by the scheduler) - eg: bkill 012345. The current crew.cluster worker termination command splices in the worker-instance-name and termination commands fail because of the invalid command (eg. bkill worker-instance-name refers to an invalid job). The LSF scheduler can deal with killing a job by name, but requires an additional argument - eg: bkill -J worker-instance-name.

Not sure if this affects other submission systems, but it might make sense to generically expose arguments for command_submit and command_delete (eg. command_submit_args and command_delete_args). This would also allow the user to specify other submit-time arguments (eg. queue).

wlandau commented 1 year ago

I can take a stab at contributing.

That would be amazing!

The LSF scheduler can deal with killing a job by name, but requires an additional argument - eg: bkill -J worker-instance-name.

In 223b0dcbba7d69db1fc6b5d15bdf31f5a7255ca6 I added a new args_terminate() method to the abstract cluster launcher class. The LSF launcher subclass can have its own args_terminate() method which still accepts a name argument but returns c("-J", shQuote(name)) instead of just shQuote(name).

wlandau commented 1 year ago

Fixed in #7. Thanks @mglev1n!