uabrc / uabrc.github.io

UAB Research Computing Documentation
https://docs.rc.uab.edu
21 stars 12 forks source link

Workflow managers and META #462

Open wwarriner opened 1 year ago

wwarriner commented 1 year ago

What would you like to see added?

We have an article on workflow managers, but it is a stub. Let's flesh it out!

Technologies to consider:

Here is a big curated list: https://github.com/pditommaso/awesome-pipeline

Topics to discuss:

Some text to get things started:

Workflow managers like Nextflow, etc., leverage the existing scheduler (correct me if I'm wrong). Naturally they have a server that manages the workflow. Their job is to facilitate creation and management of arbitrary task DAGs, and then execute those using a scheduler. Have a DAG that isn't shaped like a collapsing tree (many tasks feeding into a single task) and/or has a shortest path longer than 2 nodes? You need a workflow manager!

META is an alternative to built in array jobs. The DAG can't be more complex than a collapsing tree. It's use case is many similar tasks where there are so many it would crash the scheduler or exceed per-user job limits. Its interface is (probably?) simpler than most workflow managers. I haven't used either so I can't speak to that 100%.

Info on META is given below.

Discovered this today. It is a "meta" job scheduler for use with Slurm. https://docs.alliancecan.ca/wiki/META:_A_package_for_job_farming

Use case is many similar, serial jobs, like we saw recently when we broke the scheduler max jobs limit, meant to replace array jobs when there are too many tasks. From what I can tell it works by using an MPI "server" to host a serial job queue that doles jobs out to workers on other nodes. The server and workers are in their own long-running jobs. That way many short tasks can be fit under the umbrella of fewer, longer SLURM jobs. Example: if you have 2000 tasks, each taking about 10 minutes, you could request META ru 20 jobs, each running 100 of the tasks over ~20 hours.

wwarriner commented 1 year ago

How to install and run an example with META in a personal workspace:

One-time setup:

  1. Navigate to a directory to hold the software. I just cloned it into home (so cd ~). I am assuming with the rest of these instructions you'll do the same thing. If not, file paths will be different.
  2. git clone https://git.computecanada.ca/syam/meta-farm.git which will copy the code from the internet to ~/meta-farm

Each time you want to use their code:

  1. Add their special ~/meta-farm/bin/lockfile executable to $PATH using PATH=$PATH:~/meta-farm/bin
  2. Copy the example cp -r ~/meta-farm/example ~/metatest
  3. Modify the copied job_script.sh file let it use your account.
    • nano ~/metatest/job_script.sh to open the file in the nano editor (or use your favorite editor)
    • Either remove the line #SBATCH -A Your_account_name or change Your_account_name to $USER. Both will work the same on our system.
  4. Run ~/meta-farm/bin/submit.run 2 to try what they call META mode with two meta-jobs (i.e. two jobs doing independent processing).

Step (3) can be automatically run in your .bashrc file if you end up using this a lot. Or you can make a "startup script". Step (5) can be automated by modifying the template ~/meta-farm/example/job_script.sh file before copying.