Open ndreey opened 1 year ago
In the CAMI II they used these settings:
--k-min 21 --k-max 91
--presets meta-sensitive
First run using these settings
megahit -t 6 -m 0.8 --k-min 21 --k-max 91 -1 /mnt/c/Users/andbo/thesis_andbo/CAMISIM/platanthera_mock/reads/01_trimmed/02_trim_R1.fq.gz -2 /mnt/c/Users/andbo/thesis_andbo/CAMISIM/platanthera_mock/reads/01_trimmed/02_trim_R2.fq.gz -o /home/andbo/megahit_results/platanthera_mock_assembly/02 --out-prefix 02
-t
: Number of cores
-m
: Fraction of PC max memory
Because computation became more intense, I have switched over to Mjölnir to run the jobs with SLURM.
Two SLURM ARRAY JOBS were created to assemble each parameter, megahit_k21_array.sh
and megahit_meta_sensi_array.sh
.
They both are similar but with --presets meta-sensitive
set instead of --k-min --k-max
and more resources allocated for the meta-sensitive
run.
#!/bin/bash
#SBATCH --job-name=k21_megahit # name that will show up in the queue
#SBATCH --array=1-11%4
#SBATCH --output=slurm-%j.out # filename of the output; the %j is equal to jobID
#SBATCH --error=slurm-%j.err #
#SBATCH --partition=cpuqueue #
#SBATCH --ntasks=1 # number of tasks (analyses) to run
#SBATCH --cpus-per-task=6 # the number of threads allocated to each task
#SBATCH --mem-per-cpu=8G # memory per cpu-core
#SBATCH --time=01:30:00 # time for analysis (day-hour:min:sec)
#SBATCH --mail-type=ALL # send all type of email
#SBATCH --mail-user=andre.bourbonnais@sund.ku.dk
# I. Define directory names [DO NOT CHANGE]
# =========================================
# get the directories
submitdir=${SLURM_SUBMIT_DIR}
workdir=${TMPDIR}
jobid=${SLURM_ARRAY_TASK_ID}
# Information
echo "$(date) Submitted from ${submitdir}"
echo "$(date) Accessed ${workdir}"
echo "$(date) ArrayID: ${jobid}"
# 1. Lock and load module and data
# ============================================
module load megahit
# Get the trimmed reads
reads=${submitdir}/bsc_thesis/data/subsample/reads/01_trimmed/
# 2. Execute [MODIFY COMPLETELY TO YOUR NEEDS]
# ============================================
# Different host-contamination level
hc_level=("00" "01" "02" "03" "04" "05" "06" "07" "08" "09" "095")
# Define prefix based on array id
hc_prefix=${hc_level[$jobid-1]}
megahit -t 6 --k-min 21 --k-max 91 \
-1 ${reads}/${hc_prefix}_trim_R1.fq.gz \
-2 ${reads}/${hc_prefix}_trim_R2.fq.gz \
-o k21/${hc_prefix} \
--out-prefix "${hc_prefix}_k21"
MEGAHIT K21 RUNTIME
JobID JobName Partition AllocCPUS State ExitCode Elapsed
------------ ---------- ---------- ---------- ---------- -------- ----------
893970_1 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:16:23
893970_1.ba+ batch 6 COMPLETED 0:0 00:16:23
893970_2 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:12:41
893970_2.ba+ batch 6 COMPLETED 0:0 00:12:41
893970_3 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:11:30
893970_3.ba+ batch 6 COMPLETED 0:0 00:11:30
893970_4 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:10:43
893970_4.ba+ batch 6 COMPLETED 0:0 00:10:43
893970_5 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:06:01
893970_5.ba+ batch 6 COMPLETED 0:0 00:06:01
893970_6 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:05:57
893970_6.ba+ batch 6 COMPLETED 0:0 00:05:57
893970_7 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:12:40
893970_7.ba+ batch 6 COMPLETED 0:0 00:12:40
893970_8 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:13:23
893970_8.ba+ batch 6 COMPLETED 0:0 00:13:23
893970_9 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:05:59
893970_9.ba+ batch 6 COMPLETED 0:0 00:05:59
893970_10 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:06:49
893970_10.b+ batch 6 COMPLETED 0:0 00:06:49
893970_11 k21_megah+ cpuqueue 6 COMPLETED 0:0 00:06:27
893970_11.b+ batch 6 COMPLETED 0:0 00:06:27
MEGAHIT META-SENSITIVE RUNTIME _Mjolnir had a minor disruption which can be the cause for the long runtime for the 00_reads (8939831)
JobID JobName Partition AllocCPUS State ExitCode Elapsed
------------ ---------- ---------- ---------- ---------- -------- ----------
893983_1 metasens_+ cpuqueue 8 COMPLETED 0:0 02:10:17
893983_1.ba+ batch 8 COMPLETED 0:0 02:10:17
893983_2 metasens_+ cpuqueue 8 COMPLETED 0:0 00:29:07
893983_2.ba+ batch 8 COMPLETED 0:0 00:29:07
893983_3 metasens_+ cpuqueue 8 COMPLETED 0:0 00:55:57
893983_3.ba+ batch 8 COMPLETED 0:0 00:55:57
893983_4 metasens_+ cpuqueue 8 COMPLETED 0:0 00:41:11
893983_4.ba+ batch 8 COMPLETED 0:0 00:41:11
893983_5 metasens_+ cpuqueue 8 COMPLETED 0:0 00:25:52
893983_5.ba+ batch 8 COMPLETED 0:0 00:25:52
893983_6 metasens_+ cpuqueue 8 COMPLETED 0:0 00:25:02
893983_6.ba+ batch 8 COMPLETED 0:0 00:25:02
893983_7 metasens_+ cpuqueue 8 COMPLETED 0:0 00:27:37
893983_7.ba+ batch 8 COMPLETED 0:0 00:27:37
893983_8 metasens_+ cpuqueue 8 COMPLETED 0:0 00:29:23
893983_8.ba+ batch 8 COMPLETED 0:0 00:29:23
893983_9 metasens_+ cpuqueue 8 COMPLETED 0:0 00:30:01
893983_9.ba+ batch 8 COMPLETED 0:0 00:30:01
893983_10 metasens_+ cpuqueue 8 COMPLETED 0:0 00:27:52
893983_10.b+ batch 8 COMPLETED 0:0 00:27:52
893983_11 metasens_+ cpuqueue 8 COMPLETED 0:0 00:29:08
893983_11.b+ batch 8 COMPLETED 0:0 00:29:08
Various settings will be tested, as choosing the appropriate kmer size is not always straightforward.
Pros of small kmer
Cons of small kmer
Pros of large kmer
Cons of large kmer