The goal of VOCAL is to detect sc-2 emerging variants from collected bases of genomes, before their annotation by phylogenetic analysis. It does so by parsing sc2 genomes and detecting amino acids mutations in the spike proteins that can be associated with a phenotypic change. The phenotypic changes are annotated according to the knowledge accumulated on previous variants. Owing to the limited size of the genome, convergent evolution is expected to take place.
⚠️Note: 🔌 Right now, VOCAL tested on Linux and Mac system only 💻
clone this repository:
git clone https://github.com/rki-mf1/vocal.git
You can easly install all dependencies with conda:
cd vocal
conda env create -n vocal -f environment.yml
conda activate vocal
... in three steps.
python vocal/vocal.py -i test-data/sample-test.fasta -o results/variant_table.tsv
This creates by default a variant_table.tsv
file with all mutations.
⚠️Note: when VOCAL
is run without option, it realigns each query sequence to the reference Wuhan sequence NC_045512 using the pairwise alignment function in the biopython library.
🐌 SLOW ??: The alignment option in vocal uses a biopython pairwise aligner and can be relatively slow. It is thus recommended to first generate an alignment file of all the sequences before running vocal annotation of the mutations.
The alignment file (in PSL format) can be created using the tool pblat
that can be downloaded here or simply installed through our provided conda environment.
👀 Thus, if we want to use precomputed whole-genome alignments of the fasta file as a PSL file ( --PSL
option) to improve alignment speed please see the below section, otherwise please continue to step2.
To generate a PSL file with alignments
Example command to generate PSL format.
pblat test-data/ref.fna test-data/sample-test.fasta -threads=4 results/output.psl
To run VOCAL with a PSL file;
python vocal/vocal.py -i test-data/sample-test.fasta --PSL results/output.psl -o results/variant_table.tsv
python vocal/Mutations2Function.py -i results/variant_table.tsv -a data/table_cov2_mutations_annotation.tsv -o results/variants_with_phenotypes.tsv
By default, this step will create the consolidated table ("variants_with_phenotypes.tsv
") of mutations with phenotype annotation.
Rscript --vanilla "vocal/Script_VOCAL_unified.R" \
-f results/variants_with_phenotypes.tsv \
-o results/
in case we want to include metadata file, use (-a)
Rscript --vanilla "vocal/Script_VOCAL_unified.R" \
-f results/variants_with_phenotypes.tsv \
-a test-data/meta.tsv \
-o results/
⚠️Note: meta data must have these information
Finally, we can easily generate report into HTML format at the end of the analysis.
python vocal/Reporter.py \
-s results/vocal-alerts-samples-all.csv \
-c results/vocal-alerts-clusters-summaries-all.csv \
-o results/vocal-report.html
Please visit Explore the docs »
Vocal output an alert level in 5 different colours which can be classified into 3 ratings.
Alert color | Description | Impact |
---|---|---|
Pink | Variant is known as VOC/VOI and containing MOC or new mutations. | HIGH |
Red | Not VOC/VOI but contain high MOC or ROI, and a new matuation (likely to cause a problem/ new dangerous). | HIGH |
Orange | Variant contains moderately muations, or also possibly consider them either VUM or De-escalated variant. | MODERATE |
Lila | Mostly harmless variant (near-zero mutation size for MOC or ROI). | LOW |
Grey | No evidence of impact (either no MOC or no ROI). | LOW |
Did you find a bug?🐛 Suggestion/Feedback/Feature request?👨💻 please visit GitHub Issues
For business inquiries or professional support requests 🍺 please contact Dr. Hölzer, Martin(HoelzerM@rki.de) or Dr. Richard, Hugues (RichardH@rki.de)
Original Idea: SC2 Evolution Working group at the Robert Koch Institute in Berlin
Funding: Supported by the European Centers for Disease Control [grant number ECDC GRANT/2021/008 ECD.12222].