pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

How to use looper to run samples concurrently on a local machine #339

Closed ShannonDaddy closed 1 year ago

ShannonDaddy commented 2 years ago

Hi As I used ‘looper run' mode to run multiple samples on a local machine, I found that they were run sequentially while I expected they would be run in parallel. Is there a way to run samples in parallel on a local machine? Thanks a lot!

nsheff commented 1 year ago

Interesting, I thought we had an easier way to do this but I can't find it, so maybe I'm dreaming, or we might have removed that feature at some point...

Anyway, there is still a way, but it's not great... Looper runs everything using divvy. If you add a compute package to divvy that specifies a template like this:

{CODE} &

Then it will run the job in the background. So, you can just use looper run -p local_parallel (assuming you name this divvy package local_parallel, and it will work.

But, keep in mind, this will submit all the jobs looper runs in the background, so you'll have to control it at the looper level by using looper run --limit X if you don't want to overwhelm your CPU. So... I don't really recommend doing this, but I was able to get it to work, and in a pinch it would do what you want pretty easily as long as you stay on top of making sure you don't submit too many things at once.

I guess the general idea is, if you're needed to run in parallel, you should probably be using some kind of a cluster resource manager and submit jobs using a relevant divvy package. Looper isn't really designed as a resource manager, just as a job submitter.

ShannonDaddy commented 1 year ago

Interesting, I thought we had an easier way to do this but I can't find it, so maybe I'm dreaming, or we might have removed that feature at some point...

Anyway, there is still a way, but it's not great... Looper runs everything using divvy. If you add a compute package to divvy that specifies a template like this:

{CODE} &

Then it will run the job in the background. So, you can just use looper run -p local_parallel (assuming you name this divvy package local_parallel, and it will work.

But, keep in mind, this will submit all the jobs looper runs in the background, so you'll have to control it at the looper level by using looper run --limit X if you don't want to overwhelm your CPU. So... I don't really recommend doing this, but I was able to get it to work, and in a pinch it would do what you want pretty easily as long as you stay on top of making sure you don't submit too many things at once.

I guess the general idea is, if you're needed to run in parallel, you should probably be using some kind of a cluster resource manager and submit jobs using a relevant divvy package. Looper isn't really designed as a resource manager, just as a job submitter.

Thanks a lot!