Pipeline to create phylogenetic trees for UK and global SARS-CoV-2 sequences and metadata, and publish matched subsets of annotated trees, FASTA sequences and metadata for groups with different access to sensitive data.
Builds trees weekly, with daily updates.
git clone --recurse-submodules https://github.com/virus-evolution/phylopipe.git
cd phylopipe
conda env create -f environment.yml
conda activate phylopipe
mask.txt
source_id
, i.e. same patientepi-week
to prevent filtering by date downstreamlineage_splits.csv
FastTreeMP
and reroot on the clade-specific outgripusher
and faToVcf
, take the filtered aligned FASTA from preprocessing step 2 and construct a mutation annotated tree based on the grafted tree, adding the missing samples in the process where possiblecountry
, lineage
and uk_lineage
uk_lineages
and annotatephylotypes
for UK lineages and annotatepublish_recipes.json
usher
mutation annotated tree daily, with the full tree pipeline run weekly.grapevine
(https://github.com/COG-UK/grapevine) was the name of the original pipeline which preprocessed, aligned and variant called sequences, made phylogenetic trees and more. As the number of sequences has grown the tree building steps take increasingly long to complete. Datapipe
(https://github.com/COG-UK/grapevine_nextflow
) was created to provided daily alignment and metadata processing. This pipeline takes the output of datapipe, constructs trees, annotates and publishes them.