nils-braun / b2luigi

Task scheduling and batch running for basf2 jobs made simple
GNU General Public License v3.0
17 stars 11 forks source link

Feature Request/Enhancement: Ability to limit the number of tasks that can run concurrently in the 'local' batch_system (or per batch system) #170

Open MarcelHoh opened 2 years ago

MarcelHoh commented 2 years ago

Hi,

I often find myself running event generation and ntuple production tasks in the hundreds to thousands on lsf at kekcc followed by a few brief tasks which must be run locally rather than on the batch system due to memory issues. For these tasks I specify batch_system='local'.

In order for the tasks that are processed on the batch system to be submitted I set workers=1000. Once it comes time however for the local jobs to run this means that b2luigi tries to start lots of tasks simultaneously and runs into many Resource Unavailable errors. I would therefore like to add a feature to specify a separate number of workers for the 'local' batch_system. If you agree this would be useful I can start to work on this.

FelixMetzner commented 2 years ago

Hi Marcel, you can use the resources feature already provided by luigi for this purpose.

Cheers,

Felix

MarcelHoh commented 2 years ago

Hi Felix, thank you very much! I was not aware of this. Excuse my ignorance here, but do you know the correct way to specify the resource limit for just the 'local' batch system? As far as I can tell the batch_system' specific settings are all handled by the b2luigi settings manager and at least do not explicitly check this configuration file. Cheers, Marcel

FelixMetzner commented 2 years ago

I think a luigi.cfg will still be considered, if the file is located in the directory from which you start the local process. On KEKcc the environment from which the job is submitted should be send along and so should this config file. I am not 100 % sure, though, and it will of course depend on your specific setup.

Independent of this, the tasks you were referring to are running locally, so the config file should be used correctly. Also keep in mind, that a resource is only considered, if the task defines it and that you can change this at runtime based on luigi parameters or other information such as the host name, etc..

I would say, you just have to give it a try and see what works for you.

Cheers,

Felix

meliache commented 2 years ago

Thanks @FelixMetzner for explaining how to achieve this with luigi. I also never used resources before, but googling a bit shows some examples how to use them, e.g. in the Luigi Patterns documentation. Taken inspiration from them, I think using a property function for the resources could probably make for a dynamic solution which changes the max jobs automatically based on the batch-system of the task:

class A(b2luigi.Task):
    ...
    @property
    def resources(self):
        # If the batch-system is local, use up one local_task resource,
        # otherwise use up one batch_task resource.
        if b2luigi.get_setting("batch_system", task=self) == "local":
            return {"local_tasks": 1}
        return {"batch_tasks": 1}

Then also all tasks inheriting from A will have this dynamic property. And in the luigi.cfg then just specify available resources for local_tasks and batch_tasks as described in the documentation that Felix linked to.

Not sure whether the about code works, as I said I never tested it, but if it works maybe it would be nice to document this somewhere, as it's a useful feature.

I'm myself guilty of once accidentally starting 800 local tasks, because I wanted to process on htcondor and increased the workers after a test-run but forgot to change the batch-system. I admit it would be comfortable to have somewhere a setting which sets this for all batch-systems by default without having to give resource properties to each task, possibly with a sensible default maximum number of local tasks. But I don't want to add much code and complexity for that when there is something users can do themselves.

MarcelHoh commented 2 years ago

Thank you both, I have now tested the resources feature and this works nicely for what I need. Perhaps this could be added to the documentation on setting the 'local' batch system. I agree also that it would be nice to have this as a global setting across all tasks but it is simple to add the resources property to a task.