pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

Can Looper configure different mem and cores for each command of a pipeline, if yes and how? #380

Closed zhangzhen closed 9 months ago

zhangzhen commented 1 year ago

Looper submits a Pypiper pipeline that includes multiple commands. How can it configure different mem and cores for each command?

Cheers, Zhen Zhang

nsheff commented 1 year ago

It cannot. Looper submits jobs, and it allows you to configure compute resources (mem/cores) by the job.

The advantage of this is that it simplifies pipeline development, because you only specify the computing requirements for the entire pipeline. The disadvantage is that it limits you to a single resource specification for the entire pipeline.

If you have a pipeline that runs for a long time, with different tasks that have very different computing resource needs, then looper may not be a good fit. If it's necessary for you to customize the computing resources at the level of individual tasks within a pipeline, then you will have to use a more powerful framework.

But if your jobs are use mostly the maximum resources for most of the run of the pipeline, then looper will still be efficient.

In bioinformatics use cases, I argue that most of the pipeline time is spent on a few resource-intensive steps, like sequence alignment. Since these steps make up the majority of the pipeline, you don't lose any efficiency by just defining this all as one single task. This saves you developer effort of specifying resource use for individual pipeline commands, at the small cost of a loss of a bit of computational efficiency for over-requested resources. If this seems like a worthwhile tradeoff, then looper is a good fit for your problem.

donaldcampbelljr commented 9 months ago

Closing this for now. Please re-open if the discussion needs to continue.