Open ndreey opened 3 weeks ago
For CH, we will use long reads (HybridSPAdes) and for CO we will use normal short read metaSPAdes. However, we could possibly use long reads for CO as well, but that can be done downstream. Furthermore, when we have high quality MAGs, perhaps we could use them as well as reference based assembly... Food for thought.
These scripts require you to also give an argument specifying which prefix to assemble. Hence, in this case $1=CH
for hybrid, and CO for short read.
hybridspades-assembly.sh
#!/bin/bash
#SBATCH --job-name hybridSPAdes
#SBATCH -A naiss2024-22-580
#SBATCH -p node -n 1
#SBATCH -t 06:15:00
#SBATCH -C mem1TB
#SBATCH --output=slurm-logs/assembly/SLURM-%j-hybridSPAdes-CH.out
#SBATCH --error=slurm-logs/assembly/SLURM-%j-hybridSPAdes-CH.err
#SBATCH --mail-user=andbou95@gmail.com
#SBATCH --mail-type=ALL
# Start time and date
echo "$(date) [Start]"
# Load in modules
module load bioinfo-tools
module load spades/3.15.5
# Set variables
POP=$1
SR_DIR="05-CLEAN-MERGED"
LR_DIR="04-CLEAN-FASTQ/hifi-pacbio"
# Create directory for trimmed reads if not existing
outdir="/crex/proj/snic2020-6-222/Projects/Tconura/working/Andre/CONURA_WGS/06-ASSEMBLY/${POP}"
if [ ! -d "$outdir" ]; then
mkdir -p "$outdir"
fi
# Assembling the metagenome
spades.py \
--meta \
--only-assembler \
-k auto \
--threads 20 \
--memory 1000 \
-1 ${SR_DIR}/${POP}_R1-clean.fq.gz \
-2 ${SR_DIR}/${POP}_R2-clean.fq.gz \
--pacbio ${LR_DIR}/pt_042_001_cell1-clean.fastq.gz \
--pacbio ${LR_DIR}/pt_042_001_cell2-clean.fastq.gz \
--pacbio ${LR_DIR}/pt_042_001_cell3-clean.fastq.gz \
-o $outdir
# Restarting from checkpoint
#spades.py --continue -o $outdir
# End time and date
echo "$(date) [End]"
metaspades-assembly.sh
#!/bin/bash
#SBATCH --job-name hybridSPAdes
#SBATCH -A naiss2024-22-580
#SBATCH -p node -n 1
#SBATCH -t 06:15:00
#SBATCH -C mem1TB
#SBATCH --output=slurm-logs/assembly/SLURM-%j-hybridSPAdes-CO.out
#SBATCH --error=slurm-logs/assembly/SLURM-%j-hybridSPAdes-CO.err
#SBATCH --mail-user=andbou95@gmail.com
#SBATCH --mail-type=ALL
# Start time and date
echo "$(date) [Start]"
# Load in modules
module load bioinfo-tools
module load spades/3.15.5
# Set variables
POP=$1
SR_DIR="05-CLEAN-MERGED"
# Create directory for trimmed reads if not existing
outdir="/crex/proj/snic2020-6-222/Projects/Tconura/working/Andre/CONURA_WGS/06-ASSEMBLY/${POP}"
if [ ! -d "$outdir" ]; then
mkdir -p "$outdir"
fi
# Assembling the metagenome
spades.py \
--meta \
--only-assembler \
-k auto \
--threads 20 \
--memory 1000 \
-1 ${SR_DIR}/${POP}_R1-clean.fq.gz \
-2 ${SR_DIR}/${POP}_R2-clean.fq.gz \
-o $outdir
# Restarting from checkpoint
#spades.py --continue -o $outdir
# End time and date
echo "$(date) [End]"
Merging reads per hostplant
As a comparative measure against population based approach, we will assemble metagenomes using all CH and CO samples separately.
_merge_hostplantreads.sh
Lets confirm we have the same amount of reads and if the ID's line up
ALL GOOD