This repository analyzes viral genomes using Nextstrain to understand how SARS-CoV-2, the virus that is responsible for the COVID-19 pandemic, evolves and spreads.
We maintain a number of publicly-available builds, visible at nextstrain.org/ncov.
See our change log for details about backwards-incompatible or breaking changes to the workflow.
Visit the workflow documentation for tutorials and reference material.
The hCoV-19 / SARS-CoV-2 genomes were generously shared via GISAID. We gratefully acknowledge the Authors, Originating and Submitting laboratories of the genetic sequence and metadata made available through GISAID on which this research is based.
In order to download the GISAID data to run the analysis yourself, please see this guide.
Please note that
data/metadata.tsv
is no longer included as part of this repo. However, we provide continually-updated, pre-formatted metadata & fasta files for download through GISAID.
We issued weekly Situation Reports for the first ~5 months of the pandemic. You can find the Reports and their translations here.
Site numbering and genome structure uses Wuhan-Hu-1/2019 as reference. The phylogeny is rooted relative to early samples from Wuhan. Temporal resolution assumes a nucleotide substitution rate of 8 × 10^-4 subs per site per year. There were SNPs present in the nCoV samples in the first and last few bases of the alignment that were masked as likely sequencing artifacts.
We welcome contributions from the community! Please note that we strictly adhere to the Contributor Covenant Code of Conduct.
Please see our Contributor Guide to get started!
Please note that we automatically pick up any SARS-CoV-2 data that is submitted to GISAID.
If you're a lab and you'd like to get started sequencing, please see:
To report a bug, error, or feature request, please open an issue.
For questions, head over to the discussion board; we're happy to help!