An automated bacterial whole genome assembly and typing pipeline which primarily uses Illumina paired-end whole genome sequencing (WGS) data. In addition, Jekesa performs extensive analyses for Escherichia coli, Salmonella, Streptococcus pneumoniae and Streptococcus pyogenes (Group A Streptococcus), including in-depth virulence predicitions for various other pathogens (refer to sections below). Furthermore, Jekesa, also performs whole-genome reference-free alignments, pairwise SNP-site analysis and clustering, and generates a neighbor-joining tree which can be easily visualized using e.g. Microreact.
Jekesa (Illuminate) currently runs on a server (single compute node). The pipeline is written in Bash, R, and Rmarkdown, and generates the results report in an excel worksheet (.xlsx format) and html format.
All results will be strored in Results-ProjectName
including:
ProjectName-WGS-typing-report.xlsx
rmarkdown
usage: jekesa <options>
OPTIONS:
-p Path to output directory or project name
-a Select the assembler to use. Options available: 'spades', 'skesa', 'velvet', 'megahit'
-s Species scheme name to use for mlst typing.
Use: 'spneumoniae' or 'spyogenes' or 'senterica', for streptococcus pneumoniae or streptococcus pyogenes or salmonella
detailed analysis. Otherwise for any other schema use: 'other'. To check other available schema names use: mlst --longList.
-t Number of threads to use <integer>, (minimum value should be: 6)
-g Only perform de novo assembly
-c Path to assembled contigs to include in the typing analysis (only mlst and resistance profiling).
-h Show this help
-v Show version
cd jekesa
#This script will create analysis directory and soft link fastq files
bin/find-link-fastq.sh path/to/analysis/directory path/to/sampleID/list path/to/raw/fastqfiles
# Now run the jekesa pipeline
conda activate jekesa
jekesa -p path/to/analysis/directory -a skesa -s spyogenes -t 16 &
Clone the git repository:
git clone https://github.com/stanikae/jekesa.git
cd jekesa
After cloning the jekesa git repo, do the following to install the required dependencies and to setup the conda environment:
# JEKESA
wget -P lib https://anaconda.org/stanikae/jekesa/2021.01.15.141403/download/jekesa_v1.0.yml
conda env create -n jekesa --file ./lib/jekesa_v1.0.yml
wget -P lib https://anaconda.org/stanikae/r_env/2021.01.15.141706/download/jekesa-v1.0_r_env.yml
conda env create -n r_env --file ./lib/jekesa-v1.0_r_env.yml
## ResFinder4
wget -P lib https://anaconda.org/stanikae/resfinder/2021.06.18.105709/download/jekesa-v1.0_cge.yml
conda env create -n resfinder --file ./lib/jekesa-v1.0_cge.yml
## Other CGE tools
wget -P lib https://anaconda.org/stanikae/cge/2021.06.18.111232/download/jekesa-v1.0_resfinder4.yml
conda env create -n cge --file ./lib/jekesa-v1.0_resfinder4.yml
wget -P lib https://anaconda.org/stanikae/srst2/2021.06.18.115358/download/jekesa-v1.0_srst2.yml
conda env create -n srst2 --file ./lib/jekesa-v1.0_srst2.yml
conda activate srst2
pip install spn_scripts/srst2_env/
conda deactivate
## Activate jekesa
conda activate jekesa
cd jekesa
git pull
wget -P lib https://anaconda.org/stanikae/jekesa/2021.01.15.141403/download/jekesa_v1.0.yml
conda env update -n jekesa --file ./lib/jekesa_v1.0.yml --prune
To download and set-up required databases, execute the 00.download_databases.sh
script
cd jekesa
conda activate jekesa
bash bin/00.download_databases.sh /path/to/installation/directory
To set up ConFindr databases kindly follow instructions here: https://olc-bioinformatics.github.io/ConFindr/install/
as this requires registration on PubMLST.
conda deactivate jekesa
Stanford Kwenda
Kwenda S., Allam M., Khumalo Z.T.H., Mtshali S., Mnyameni F., Ismail A. Jekesa: an automated easy-to-use pipeline for bacterial whole genome typing Github https://github.com/stanikae/jekesa