hoelzer-lab / ribap

A comprehensive bacterial core gene-set annotation pipeline based on Roary and pairwise ILPs
GNU General Public License v3.0
19 stars 3 forks source link

Cluster small processes such as mafft, nw_display, ... #2

Closed hoelzer closed 4 years ago

hoelzer commented 4 years ago

We perform a lot of small processes (mafft, fasttree, nw_display) that not necessarily need to run in a single job submission when executing the pipeline on an HPC or Cloud.

Solution: cluster processes together in chunks of e.g. 20 or 50 files and then submit jobs. So instead of submitting 1000 mafft jobs submit 20*50 mafft jobs.

This should help with latency problems on HPCs.