pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

how do to parallel-process files with looper locally -- lump them together, this will group several commands in one like: #498

Open donaldcampbelljr opened 4 weeks ago

donaldcampbelljr commented 4 weeks ago

how do to parallel-process files with looper locally -> originally a divvy idea. issue: 100 files -> divvy submits to cluster no problem, if local they will run serial. Could run in background process using ampersand. in command shell script with &. So can we lump 100 samples (in 10 background processes), new divvy template to accomplish that

donaldcampbelljr commented 2 days ago

Currently, if using lump for samples locally (--lump-n), we get a couple submission scripts:

#!/bin/bash

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample1 --req-attr val 
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample2 --req-attr val 
} | tee /tmp/tmplr4qcfsa/advanced/results/submission/PIPELINE1_lump1.log
#!/bin/bash

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample3 --req-attr val 
} | tee /tmp/tmplr4qcfsa/advanced/results/submission/PIPELINE1_lump3.log

Perhaps we can change the template to use & for running in parallel?


#!/bin/bash

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample1 --req-attr val &
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample2 --req-attr val &
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample3 --req-attr val & 
} | tee /tmp/tmplr4qcfsa/advanced/results/submission/PIPELINE1_lump1.log