stanikae / jekesa

Jekesa (Illuminate) is an automated bash pipeline for bacterial whole genome assembly and typing using Illumina paired-end sequencing data.
https://github.com/stanikae/jekesa
GNU General Public License v3.0
6 stars 4 forks source link

JEKESA

An automated bacterial whole genome assembly and typing pipeline which primarily uses Illumina paired-end whole genome sequencing (WGS) data. In addition, Jekesa performs extensive analyses for Escherichia coli, Salmonella, Streptococcus pneumoniae and Streptococcus pyogenes (Group A Streptococcus), including in-depth virulence predicitions for various other pathogens (refer to sections below). Furthermore, Jekesa, also performs whole-genome reference-free alignments, pairwise SNP-site analysis and clustering, and generates a neighbor-joining tree which can be easily visualized using e.g. Microreact.

Pipeline overview

Jekesa (Illuminate) currently runs on a server (single compute node). The pipeline is written in Bash, R, and Rmarkdown, and generates the results report in an excel worksheet (.xlsx format) and html format.

De novo genome assembly and classification

MLST typing

Resistance profiling

Virulence gene predicition

Plasmid detection

Escherichia coli specific analysis

Salmonella enterica specific analysis

Streptococcus pneumoniae specific analysis

Streptococcus pyogenes specific analysis

Reference-free alignments, pairwise SNP differences, and neighbor-joining tree construction

Output and reporting

All results will be strored in Results-ProjectName including:

Usage

usage: jekesa <options>

OPTIONS:
        -p      Path to output directory or project name
        -a      Select the assembler to use. Options available: 'spades', 'skesa', 'velvet', 'megahit'
        -s      Species scheme name to use for mlst typing.
                Use: 'spneumoniae' or 'spyogenes' or 'senterica', for streptococcus pneumoniae or streptococcus pyogenes or salmonella
                detailed analysis. Otherwise for any other schema use: 'other'. To check other available schema names use: mlst --longList.
        -t      Number of threads to use <integer>, (minimum value should be: 6)
        -g      Only perform de novo assembly
        -c      Path to assembled contigs to include in the typing analysis (only mlst and resistance profiling).
        -h      Show this help
        -v      Show version

Example

cd jekesa
#This script will create analysis directory and soft link fastq files
bin/find-link-fastq.sh  path/to/analysis/directory path/to/sampleID/list  path/to/raw/fastqfiles 

# Now run the jekesa pipeline
conda activate jekesa
jekesa -p path/to/analysis/directory -a skesa -s spyogenes -t 16 &

Installation

Clone the git repository:
git clone https://github.com/stanikae/jekesa.git
cd jekesa

After cloning the jekesa git repo, do the following to install the required dependencies and to setup the conda environment:

# JEKESA
wget -P lib https://anaconda.org/stanikae/jekesa/2021.01.15.141403/download/jekesa_v1.0.yml
conda env create -n jekesa --file ./lib/jekesa_v1.0.yml

Installation of dependancies

1. R packages

wget -P lib https://anaconda.org/stanikae/r_env/2021.01.15.141706/download/jekesa-v1.0_r_env.yml
conda env create -n r_env --file ./lib/jekesa-v1.0_r_env.yml

2. CGE tools

## ResFinder4 
wget -P lib https://anaconda.org/stanikae/resfinder/2021.06.18.105709/download/jekesa-v1.0_cge.yml
conda env create -n resfinder --file ./lib/jekesa-v1.0_cge.yml

## Other CGE tools
wget -P lib https://anaconda.org/stanikae/cge/2021.06.18.111232/download/jekesa-v1.0_resfinder4.yml
conda env create -n cge --file ./lib/jekesa-v1.0_resfinder4.yml

3. srst2 env (For CDC StrepLab scripts)

wget -P lib https://anaconda.org/stanikae/srst2/2021.06.18.115358/download/jekesa-v1.0_srst2.yml
conda env create -n srst2 --file ./lib/jekesa-v1.0_srst2.yml
conda activate srst2
pip install spn_scripts/srst2_env/
conda deactivate
## Activate jekesa
conda activate jekesa 

If you already have jekesa installed, you can upgrade as follows:

cd jekesa
git pull
wget -P lib https://anaconda.org/stanikae/jekesa/2021.01.15.141403/download/jekesa_v1.0.yml
conda env update -n jekesa --file ./lib/jekesa_v1.0.yml --prune

Setting up required databases

To download and set-up required databases, execute the 00.download_databases.sh script

cd jekesa
conda activate jekesa
bash bin/00.download_databases.sh /path/to/installation/directory
ConFindr databases

To set up ConFindr databases kindly follow instructions here: https://olc-bioinformatics.github.io/ConFindr/install/ as this requires registration on PubMLST.

To deactivate jekesa (At the end of the analysis)

conda deactivate jekesa

Author

Stanford Kwenda

License

GPL 3.0

Citation

Kwenda S., Allam M., Khumalo Z.T.H., Mtshali S., Mnyameni F., Ismail A. Jekesa: an automated easy-to-use pipeline for bacterial whole genome typing Github https://github.com/stanikae/jekesa