usafsam / mad_river_wf

SARS-CoV-2 analysis workflow, using Nextflow and bbtools
Apache License 2.0
4 stars 2 forks source link

Mad River workflow

Named after the Mad River, one of the major rivers that runs through Dayton, Ohio.

Mad River is a workflow developed by Padraic Fanning at United States Air Force School of Aerospace Medicine (USAFSAM) Applied Technologies & Genomics Division, based on the Cecret workflow by Erin Young and work by Dr. Anthony Fries. This workflow is initially designed for SARS-COV-2 sequencing with the Illumina Nextera XT library prep workflow using version 1 of the "midnight" primer set by Freed and Silander. Currently, this workflow (in its current state) has been tested on data generated from NextSeq runs and includes potentially helpful diagnostics such as spike gene coverage and variant coverage/quality metrics. The tools used are mostly sourced from the Docker images provided by StaPH-B, which include:

However, the Docker images and the exact Conda environments can be customized to your liking.

Getting Started

In order to run this workflow, you will need Conda, Nextflow, and Docker. Then, if you do not wish to modify this workflow's scripts or reference files, run:

nextflow run usafsam/mad_river_wf \
    --reads {READS_DIR} \
    --run_info {PATH_TO}/RunInfo.xml \
    --stats_json {PATH_TO}/Stats.json \
    --outdir {OUTDIR}

If you are using this workflow locally, replace usafsam/mad_river_wf with the path to where this workflow resides.

Specifying Primers

The default set of primers used in this workflow is version 2.0 of the SARS-CoV-2 Midnight Amplicon panel as provided by IDT (released April 2022). A TSV file of these primers can be found in the reference/ directory of this repository, along with prior versions of this set. A python script, process_primer_reference.py, takes this TSV file and produces a FASTA file of the primers (this gets used by BBDuk), along with BED files for the primers and amplicons. The version numbers follow the principles of semantic versioning, where the MAJOR version corresponds to a new release from IDT and the MINOR version corresponds to one or more additional primers being spiked into the reaction mixture. Here is a table that shows which primers are included in each version of the set.

Primer Name v1.0 v1.1 v1.2 v2.0
1_LEFT through 29_RIGHT :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
28_LEFT_OMICRON (C27807T) :x: :white_check_mark: :white_check_mark: :white_check_mark:
22_RIGHT_OMICRON (G22599A) :x: :x: :white_check_mark: :white_check_mark:
23_LEFT_OMICRON (C22522T) :x: :x: :white_check_mark: :white_check_mark:
26_LEFT_OMICRON (C25708T) :x: :x: :white_check_mark: :white_check_mark:
21_RIGHT_ARTIC_71R (71_RIGHT from ARTIC v4.1) :x: :x: :x: :white_check_mark:

You can adapt the general format of the TSVs found in reference/ to the set of primers you have. To use a different suite of primers other than the default, override the value of params.primer_tsv in a user-provided config file, and/or specify --primer_tsv $PRIMER_SCHEME when invoking Nextflow from the command line.