Closed wlandau closed 1 year ago
I can take a stab at contributing. An initial issue that I've noticed is with the current generic cluster worker termination command:
The LSF scheduler uses the bkill
command to terminate jobs, but by default expects a numeric job ID (assigned by the scheduler) - eg: bkill 012345
. The current crew.cluster
worker termination command splices in the worker-instance-name
and termination commands fail because of the invalid command (eg. bkill worker-instance-name
refers to an invalid job). The LSF scheduler can deal with killing a job by name, but requires an additional argument - eg: bkill -J worker-instance-name
.
Not sure if this affects other submission systems, but it might make sense to generically expose arguments for command_submit
and command_delete
(eg. command_submit_args
and command_delete_args
). This would also allow the user to specify other submit-time arguments (eg. queue).
I can take a stab at contributing.
That would be amazing!
The LSF scheduler can deal with killing a job by name, but requires an additional argument - eg: bkill -J worker-instance-name.
In 223b0dcbba7d69db1fc6b5d15bdf31f5a7255ca6 I added a new args_terminate()
method to the abstract cluster launcher class. The LSF launcher subclass can have its own args_terminate()
method which still accepts a name
argument but returns c("-J", shQuote(name))
instead of just shQuote(name)
.
Fixed in #7. Thanks @mglev1n!
@mglev1n, from your comment in #3: are you working on an LSF plugin just for yourself, or are you planning to contribute to
crew.cluster
? In the former case, I will await your PR. Otherwise, I will create one and share with you for testing.