nils-braun / b2luigi

Task scheduling and batch running for basf2 jobs made simple
GNU General Public License v3.0
17 stars 11 forks source link

Adding an option to set the lsf job_name parameter #76

Closed MarcelHoh closed 3 years ago

MarcelHoh commented 3 years ago

I find it can be useful to get a quick overview of the running jobs through their job_name parameters on kekcc. With b2luigi submitted jobs this is set to "{FULL_PATH}/executable_wrapper.sh" which is not easy to parse at a glance.

This PR adds the option to set a 'job_name' attribute for your tasks which is then submitted as the lsf job name (-J task.job_name).

meliache commented 3 years ago

Hi, thanks for the PR, I'm always happy if I see that people contribute. I'll give this a proper reviewer once I find some free time. I basically inherited this project from Nils, but haven't used the LSF batch much and actually never used it directly via the CLI. First of all thanks for including documentation right away.

You expect the job name to be provided as a parameter of the task class/object. I'm thinking (but yet undecided) whether it wouldn't be better to use the b2luigi settings interface that is used for all other settings, i.e. something similar to

job_name = b2luigi.get_setting("job_name", task=self.task,  default=None)
if job_name is not None:
    command += ...

This would still allow setting the job name via a task parameter, but would also allow using b2luigi.set_setting() and providing the setting via a settings.json file.

My question: Does the LSF job name have to be unique per task? E.g. is it possible to submit multiple tasks with the same job name? If that is not possible, then providing a global job name via set_setting() or settings.json might be unwanted and there is a valid reason to only allow setting it inside tasks. However, if it needs to be unique, then the code in your documentation example is not good, since it would use the same job name for all instances (with different luigi parameters) of a task class. So I'm not sure which is the case.

MarcelHoh commented 3 years ago

Hi, thanks for the quick response! For lsf the job_name does not need to be unique. There is a separate unique id assigned when the job is submitted.

I have no strong opinion as to how to best implement the parameter, for now I just followed the approach taken by the queue parameter. Currently if I want to adjust the job_name for a particular instance of the class I overwrite the attribute in the __init__ function but I think this should still be possible if it is switched to using the settings interface.

meliache commented 3 years ago

I just followed the approach taken by the queue parameter

Ah I just saw that, I now checked out your branch on my notebook to take a closer look. Then its consistent and I'm fine with it, though I would be interested in hearing from @nils-braun if there's a specific reason the queue is not implemented as a setting.

nils-braun commented 3 years ago

No reason, other than I probably introduced the queue parameter before I started with the settings (but maybe I am wrong here) :-)

Would be nice to also move the queue parameter - I would vote to have both be accessible via settings.

meliache commented 3 years ago

Fixed the docstring formatting and created #79 for using the job_name setting for htcondor too, now I think this is good to merge :)

MarcelHoh commented 3 years ago

sorry I was a little busy over the last few days. Thank you very much for fixing it up and merging it!