=============================
SmartSlurm is an automated computational tool designed to estimate and optmize resouces for Slurm jobs. There are two major parts:
ssbatch: An sbatch wrapper with a custom function to estimate memory RAM and time based on several factors (i.e., program type, input size,previous job records). Once the memory and time values are estimated, jobs are submitted to the scheduler while keeping a record of the jobs history and sending an optional email notification.
runAsPipeline: It parses bash script to find user defined commands and call ssbatch to submit jobs to slurm. It take care of job dependency.
Smart Sbatch (ssbatch) was originally designed to run the ENCODE ATAC-seq pipeline, with the intention of automatically modifing the job's partition based on the cluster's configuration and available partitions. This removed the need for a user to modify the original workflow. Later, ssbatch was improved to include more features.
Figure 1 - Illustrates that memory usage is roughly correlated with the input size. Therefore, the input size can be use as a proxy to allocate memory when submitting new jobs.
Figure 2 - ssbatch runs the first five jobs using the default memory. Then, based on these initials jobs, it estimates memory for future jobs. As a result, the amount of wasted memory is dramatially decreased for the future jobs.
Figure 3 - ssbatch runs the first five jobs using the default time. Subsequently, the allocation of resources, specifcally time, is dramatically improved for the following jobs.
1) Auto adjust memory and run-time according to statistics from earlier jobs 2) Auto choose partition according to run-time request 3) Auto re-run failed OOM (out of memory) and OOT (out of run-time) jobs 4) (Optional) Generate a checkpoint before the job runs out of time or memory, and use the checkpoint to re-run jobs. 5) More informative emails: Slurm has a limited email notification mechanism, which only includes a subject line. In contrast, ssbatch attaches the content of the sbatch script, as well as the output and error log, to the email.
# Download
cd $HOME
git clone https://github.com/ld32/SmartSlurm.git
# Setup path
export PATH=$HOME/SmartSlurm/bin:$PATH
# Create 5 files with numbers for testing
createNumberFiles.sh
# Run 3 jobs to get memory and run-time statistics for script findNumber.sh
# findNumber is just a random name. You can use anything you like.
ssbatch -P findNumber -I numbers3.txt -F find3 --mem 4G -t 2:0:0 \
--wrap="findNumber.sh 12345 numbers3.txt"
ssbatch -P findNumber -I numbers4.txt -F find4 --mem 4G -t 2:0:0 \
--wrap="findNumber.sh 12345 numbers4.txt"
ssbatch -P findNumber -I numbers5.txt -F find5 --mem 4G -t 2:0:0 \
--wrap="findNumber.sh 12345 numbers5.txt"
# After the 3 jobs finish, when submitting more jobs, ssbatch auto adjusts
# memory and run-time according input file size
# Notice: this command submits the job to short partition, and reserves 21M memory
# and 13 minute run-time
ssbatch -P findNumber -I numbers1.txt -F find1 --mem 4G -t 2:0:0 \
--wrap="findNumber.sh 12345 numbers1.txt"
# You can have multiple inputs:
ssbatch -P findNumber -I "numbers1.txt numbers2.txt" -F find12 --mem 4G -t 2:0:0 \
--wrap="findNumber.sh 12345 numbers1.txt numbers2.txt"
# If input file is not given using option -I. ssbatch will choose the memory
# and run-time threshold so that 90% jobs can finish successfully
ssbatch -P findNumber -F find21 --mem 4G -t 2:0:0 \
--wrap="findNumber.sh 12345 numbers2.txt"
# check job status:
checkRun
# cancel all jobs submitted from the current directory
cancelAllJobs
# rerun jobs:
# when re-run a job with the same program and same input(s), if the previous run was successful,
# ssbatch will ask to confirm you do want to re-run
ssbatch -P findNumber -I numbers1.txt -F find1 --mem 4G -t 2:0:0 \
--wrap="findNumber.sh 12345 numbers1.txt"
# To remove ssbatch from PATH:
source `which unExport`; unExport ssbatch
1) Auto adjust memory and run-time according to statistics from earlier jobs
$smartSlurmJobRecordDir/jobRecord.txt contains job memory and run-time records. There are three important columns:
1st column is the job ID
2rd column is the input size
7th column is the actual memory usage
8th column is the actual time usage
The data from the three columns are plotted and statistics
1jobID,2inputSize,3mem,4time,5mem,6time,7mem,8time,9status,10useID,11path,12software,13reference
46531,1465,4G,2:0:0,4G,0-2:0:0,3.52,1,COMPLETED,ld32,,findNumber,none
46535,2930,4G,2:0:0,4G,0-2:0:0,6.38,2,COMPLETED,ld32,,findNumber,none
46534,4395,4G,2:0:0,4G,0-2:0:0,9.24,4,COMPLETED,ld32,,findNumber,none
#Here is the input size vs memory plot for findNumber:
#Here is the input size vs run-time plot for findNumber:
2) Auto choose partition according to run-time request
smartSlrm/config/config.txt contains partition time limit and bash function adjustPartition to adjust partition for sbatch jobs:
# General partitions, ordered by maximum allowed run-time in hours
partition1Name=short; partition1TimeLimit=12 # run-time > 0 hours and <= 12 hours
partition2Name=medium; partition2TimeLimit=120 # run-time > 12 hours and <= 5 days
partition3Name=long; partition3TimeLimit=720 # run-time > 5 days and <= 30 days
...
#function
adjustPartition() {
... # please open the file to see the content
} ; export -f adjustPartition
3) Auto re-run failed jobs with Out Of Memory (OOM) and Out Of run-Time (OOT) states
At the end of the job, $smartSlurmJobRecordDir/bin/cleanUp.sh checks memory and time usage, saves the data in to log $smartSlurmJobRecordDir/myJobRecord.txt. If the job fails, ssbatch re-submit with double memory or double time, clear up the statistic formula, so that later jobs will re-caculate statistics,
4) Checkpoint
If the checkpoint feature is enabled, before the job run out of memory or time, ssbatch generate a checkpoint and resubmit the job.
5) More informative emails: Slurm has a limited email notification mechanism, which only includes a subject line. In contrast, ssbatch attaches the content of the sbatch script, as well as the output and error log, to the email.
$smartSlurmJobRecordDir/bin/cleanUp.sh also sends an email to user. Attached are the Slurm script, the sbatch command used, and the contents of the output and error log files.
Yes for ssbatch. ssbatch directly submits the job without pending.
No for runAsPipeline. If you would like to submit more than 5 jobs, let the first
5 directly run, but put other jobs on pending until the first 5 finish,
then release the others with estimated resounce, please use runAsPipeline.
Yes. If -F is not given, program + input will become the unique flag for the job.
Yes. If -P is not given, slurm script name or wrap command will be used as program name.
Yes. But If -I is not given, resource estimation will be based on program name only.
Yes. Please use this:
ssbatch -I jobSize:12 ... # Here 12 is the input size. It can be any integer.
Yes. All regular sbatch options are OK to have.
Yes. You can have -I "input1.txt input2.txt".
Has -F jobUniqueFlag?
If yes, use jobUniqueFlag as job flag.
Otherwise, check if there is -P xyz?
If yes, use xyz as program name.
Othewise, use the command in --wrap or slurm script as program name.
Check if there is -I inputFile?
If yes, use program+inputFile as job unique flag.
Otherwise, create a unique job flag, such as program+randomSring.
If job successfully finished:
If there are less than 200 job records for this software and reference or current job input
is larger than max input size for all earlier jobs?
If yes, and job record is unique, put the current job record in jobRecord.txt
else if OOM or OOT:
calcualte extraMem for future job estimations
remove formula for this software and referencce
It is in folder: $smartSlurmJobRecordDir/stats. Here $smartSlurmJobRecordDir is defined in SmartSlurm/config/config.txt
Check if there is input for this job?
If yes: check if there are formulas to estimate memory/time..
If yes: check if the input size is smaller than max of previous jobs? Or input size
is less than the max, but there are at last 10 job records
Yes, estimate memory/time and submit job.
Otherwise, Make new formula.
If successful, estimate memory/time and submit job.
Otherwise, use default memory/time and submit job.
Otherwise, use default memory/time and submit job.
Otherwise: use 90th percentile as estimated value and submit job
When a job successuflly finihes, the software create a file $jobFlag.success in folder smartSlurmLogDir.
Notice $smartSlurmLogDir is defined in SmartSlurm/config/config.txt
# Download smartSlurm if it is not done yet
cd $HOME
git clone https://github.com/ld32/SmartSlurm.git
export PATH=$HOME/SmartSlurm/bin:$PATH
cp $HOME/SmartSlurm/bin/Snakefile .
cp $HOME/SmartSlurm/bin/config.yaml .
# Create Snakemake conda env (from: https://snakemake.readthedocs.io/en/v3.11.0/tutorial/setup.html)
module load miniconda3
mamba env create --name snakemakeEnv --file $PWD/SmartSlurm/config/snakemakeEnv.yaml
# Review Snakefile, activate the snakemake env and run test
module load miniconda3
source activate snakemakeEnv
export PATH=$PWD/SmartSlurm/bin:$PATH
cat Snakefile
snakemake -p -j 999 --latency-wait=80 --cluster "ssbatch -t 100 --mem 1G -p short"
If you have multiple Slurm account:
snakemake -p -j 999 --latency-wait=80 --cluster "ssbatch -A mySlurmAccount -t 100 --mem 1G"
Coming soon
# Download smartSlurm if it is not done yet
cd $HOME
git clone https://github.com/ld32/SmartSlurm.git
# Create Nextflow conda env
module load miniconda3
mamba create -n nextflowEnv -c bioconda -y nextflow
# Review nextflow file, activate the nextflow env, and run test
module load miniconda3
export PATH=$HOME/SmartSlurm/bin:$PATH
source activate nextflowEnv
cp $HOME/SmartSlurm/bin/nextflow.nf .
cp $HOME/SmartSlurm/config/nextflow.config .
# If you have multiple Slurm account, modify the config file:
nano nextflow.config
#change:
//process.clusterOptions = '--account=mySlurmAcc'
to
process.clusterOptions = '--account=mySlurmAcc'
# save the file
# Ready to run:
nextflow run nextflow.nf -profile slurm
Smart pipeline was originally designed to run bash scripts as a pipeline in a Slurm cluster. We added dynamic memory and run-time features to it and now call it Smart pipeline. The runAsPipeline script converts an input bash script to a pipeline that easily submits jobs to the Slurm scheduler for you.
#Here is the memory usage by the optimized workflow: The original pipeline has 11 steps. Most of the steps only need less than 10G memory to run. But one of the steps need 140G. Because the original pipeline is submitted as a single huge job, 140G is reserved for all the steps. (Each compute node in the cluster has 256 GB RAM.) By submitting each step as a separate job, most steps only need to reserve 10G, which decreases memory usage dramatically. (The pink part of the graph below shows these savings.) Another optimization is to dynamically allocate memory based on the reference genome size and input sequencing data size. (This in shown in the yellow part of the graph.) Because of the decreased resource demand, the jobs can start earlier, and in turn increase the overall throughput.
1) Submit each step as a cluster job using ssbatch, which auto-adjusts memory and run-time according to statistics from earlier jobs, and re-run OOM/OOT jobs with doubled memory/run-time 2) Automatically arrange dependencies among jobs 3) Email notifications are sent when each job fails or succeeds 4) If a job fails, all its downstream jobs automatically are killed 5) When re-running the pipeline on the same data folder, if there are any unfinished jobs, the user is asked to kill them or not 6) When re-running the pipeline on the same data folder, the user is asked to confirm to re-run or not if a job or a step was done successfully earlier 7) For re-run, if the script is not changed, runAsPipeline does not re-process the bash script and directly uses old one 8) If user has more than one Slurm account, adding -A or —account= to command line to let all jobs to use that Slurm account 9) When adding new input data and re-run the workflow, affected successfully finished jobs will be auto re-run.Run bash script as Smart Slurm pipeline
# Download if it is not downlaod yet
cd $HOME
git clone https://github.com/ld32/SmartSlurm.git
# Setup path
export PATH=$HOME/SmartSlurm/bin:$PATH
# Take a look at a regular example bash script
cat $HOME/SmartSlurm/scripts/bashScriptV1.sh
# Below is the content of a regular bashScriptV1.sh
1 #!/bin/sh
2
3 number=$1
4
5 [ -z "$number" ] && echo -e "Error: number is missing.\nUsage: bashScript <numbert>" && exit 1
6
7 for i in {1..5}; do
8
9 input=numbers$i.txt
10
11 findNumber.sh 1234 $input > $number.$i.txt
12
13 done
14
15 cat $number.*.txt > all$number.txt
# Notes about bashScriptV1.sh:
#The script first finds a certain number given from commandline in file numbers1.txt until numbers5.txt in row 11, then merges the results into all.txt in row 15
# In order to tell the Smart Pipeline which step/command we want to submit as Slurm jobs,
# we add comments above the commands also some helping commands:
cat $HOME/SmartSlurm/scripts/bashScriptV2.sh
# Below is the content of bashScriptV2.sh
1 #!/bin/sh
2
3 number=$1
4
5 [ -z "$number" ] && echo -e "Error: number is missing.\nUsage: bashScript <numbert>" && exit 1
6
7 for i in {1..5}; do
8
9 input=numbers$i.txt
10
11 #@1,0,findNumber,,input,sbatch -p short -c 1 --mem 4G -t 50:0
12 findNumber.sh 1234 $input > $number.$i.txt
13
14 done
15
16 #@2,1,mergeNumber,,,sbatch -p short -c 1 --mem 4G -t 50:0
17 cat $number.*.txt > all$number.txt
Step 1 is denoted by #@1,0,findNumber,,input,sbatch -p short -c 1 --mem 4G -t 2:0:0 (line 11 above), which means this is step 1 that depends on no other step, run software findNumber, use the value of $i as unique job identifier for this this step, does not use any reference files, and file $input is the input file, needs to be copied to the /tmp directory if user want to use /tmp. The sbatch command tells the pipeline runner the sbatch parameters to run this step.
Step 2 is denoted by #@2,1,findNumber,,input (line 16), which means that this is step1 that depends on step1, and the step runs software mergeNumber with no reference file, does not need unique identifier because there is only one job in the step, and use $input as input file. Notice, there is no sbatch here, so the pipeline runner will use default sbatch command from command line (see below).
Notice the format of step annotation is #@stepID,dependIDs,sofwareName,reference,input,sbatchOptions. Reference is optional, which allows the pipeline runner to copy data (file or folder) to local /tmp folder on the computing node to speed up the software. Input is optional, which is used to estimate memory/run-time for the job. sbatchOptions is also optional, and when it is missing, the pipeline runner will use the default sbatch command given from command line (see below).
Here are two more examples:
runAsPipeline bashScriptV2.sh "sbatch -p short -t 10:0 -c 1" useTmp
This command will generate new bash script of the form slurmPipeLine.checksum.sh in log folder. The checksum portion of the filename will have a MD5 hash that represents the file contents. We include the checksum in the filename to detect when script contents have been updated. If it is not changed, we don not re-create the pipeline script.
This runAsPipeline command will run a test of the script, meaning does not really submit jobs. It will only show a fake job id like 1234 for each step. If you were to append run at the end of the command, the pipeline would be submitted to the Slurm scheduler.
Ideally, with useTmp, the software should run faster using local /tmp disk space for database/reference than the network storage. For this small query, the difference is small, or even slower if you use local /tmp. If you don't need /tmp, you can use noTmp.
With useTmp, the pipeline runner copy related data to /tmp, and all file paths will be automatically updated to reflect a file's location in /tmp when using the useTmp option. Sample output from the test run
Note that only step 2 used -t 2:0:0, and all other steps used the default -t 10:0. The default walltime limit was set in the runAsPipeline command, and the walltime parameter for step 2 was set in the bash_script_v2.sh script. runAsPipeline bashScriptV2.sh "sbatch -p short -t 10:0 -c 1" useTmp
runAsPipeline "bashScriptV2.sh 1234" "sbatch -p short -t 10:0 -c 1" useTmp run
runAsPipeline run date: 2024-04-28_16-03-36_4432
Running: /home/ld32/SmartSlurm/bin/runAsPipeline /home/ld32/SmartSlurm/scripts/bashScriptV2.sh 1234
sbatch -A rccg -p short -c 1 --mem 4G -t 50:0 noTmp run
Converting /home/ld32/SmartSlurm/scripts/bashScriptV2.sh to
/home/ld32/scratch/SmartSlurmTest/log/slurmPipeLine.eccd33a67760d5928f1c4cfea17ae574.run.sh
find for loop start: for i in {1..5}; do
find job marker:
#@1,0,findNumber,,input,sbatch -p short -c 1 --mem 4G -t 50:0
sbatch options: sbatch -p short -c 1 --mem 4G -t 50:0
find job:
findNumber.sh 1234 $input > $number.$i.txt
findNumber.sh 1234 $input > $number.$i.txt --before parsing
findNumber.sh 1234 $input > $number.$i.txt --after parseing
find loop end: done
find job marker:
#@2,1,mergeNumber,,,sbatch -p short -c 1 --mem 4G -t 50:0
sbatch options: sbatch -p short -c 1 --mem 4G -t 50:0
find job:
cat $number.*.txt > all$number.txt
cat $number.*.txt > all$number.txt --before parsing
cat $number.*.txt > all$number.txt --after parseing
/home/ld32/scratch/SmartSlurmTest/log/slurmPipeLine.
7ae574.run.sh /home/ld32/SmartSlurm/scripts/bashScriptV2.sh is ready to run. Starting to run ...
Running /home/ld32/scratch/SmartSlurmTest/log/slurmPipeLine.7ae574.run.sh
/home/ld32/SmartSlurm/scripts/bashScriptV2.sh
---------------------------------------------------------
step: 1, depends on: 0, job name: findNumber, flag: 1.0.findNumber.1
Got output from ssbatch: Submitted batch job 69308
step: 1, depends on: 0, job name: findNumber, flag: 1.0.findNumber.2
Got output from ssbatch: Submitted batch job 69309
step: 1, depends on: 0, job name: findNumber, flag: 1.0.findNumber.3
Got output from ssbatch: Submitted batch job 69310
step: 1, depends on: 0, job name: findNumber, flag: 1.0.findNumber.4
Got output from ssbatch: Submitted batch job 69311
step: 1, depends on: 0, job name: findNumber, flag: 1.0.findNumber.5
Got output from ssbatch: Submitted batch job 69312
step: 2, depends on: 1, job name: mergeNumber, flag: 2.1.mergeNumber
Got output from ssbatch: Submitted batch job 69313
All submitted jobs:
job_id depend_on job_flag software reference inputs
69308 null 1.0.findNumber.1 findNumber none ,numbers1.txt
69309 null 1.0.findNumber.2 findNumber none ,numbers2.txt
69310 null 1.0.findNumber.3 findNumber none ,numbers3.txt
69311 null 1.0.findNumber.4 findNumber none ,numbers4.txt
69312 null 1.0.findNumber.5 findNumber none ,numbers5.txt
69313 69308:69309:69310:69311:69312 2.1.mergeNumber mergeNumber none none
---------------------------------------------------------
Please check .smartSlurm.log for detail logs.
You can use the command:
ls -l log
This command list all the logs created by the pipeline runner. *.sh
files are the slurm scripts for each step, *.out files are output files
for each step, *.success files means job successfully finished for each
step and *.failed means job failed for each steps.
You can use the command to cancel running and pending jobs:
cancelAllJobs
In case you wonder how it works, here is a simple example to explain.
runAsPipeline goes through the bash script, read the for loop and job decorators, set up slurm script for each step and job dependencies, and submit the jobs.
No. If the jobd don't depend on other job, runAsPipeline will submit all jobs at once, but only let the first jobs run, the othe jobs wait for the first 5 finish to get some statistics, then estimate memory and time, then release them to run.
Yes. Please us this:
someVariableName=jobSize:12 # here 12 is the input size. It can be any integer.
#@1,0,runshard,,someVariableName,sbatch -p short -c 4 -t 0-12:00 --mem 8G
Yes. All regular sbatch options are OK to have.
Yes. You can have input="input1.txt input2.txt" or #@2,1,find,,input1.input2,sbstch ...
When a job successuflly finihes, the software create a file $jobFlag.success in folder smartSlurmLogDir.
Notice $smartSlurmLogDir is defined in SmartSlurm/config/config.txt
=======
cd ~
git clone git@github.com:ld32/SmartSlurm.git
export PATH=$HOME/SmartSlurm/bin:$PATH
sbatchAndTop <sbatch option1> <sbatch option 2> <sbatch option 3> <...>
# Such as:
sbatchAndTop -p short -c 1 -t 2:0:0 --mem 4G --wrap "my_application para1 para2"
# Here -p short is optional, because ssbatch chooses partition according to run time.
# or:
sbatchAndTop job.sh
## sbatchAndTop features:
1) Submit slurm job using ssbatch (scroll up to see ssbatch features) and run scontrol top on the job