This repository holds WDL workflows and Docker build scripts for production workflows for data QC, assembly generation, and assembly QC used by the Human Pangenome Reference Consortium.
All WDLs and containers created in this repository are licensed under the MIT license. The underlying tools (that the WDLs and containers run) are likely covered under one or more Free and Open Source Software licenses, but we cannot make any guarantees to that fact.
Workflows are split across data_processing, assembly, and (assembly) QC folders; each with the following folder structure:
── docker/
└── toolName/
└── Dockerfile
└── Makefile
└── scripts/
└── toolName/
└── scriptName.py
── wdl/
└── tasks/
│ └── taskName.wdl
└── workflows/
└── workFlowName.wdl
The root level of the data_processing, assembly, and (assembly) QC folders each contain a readme that provides details about the workflows and how to use them. Summaries of the workflows in each area are below.
The HPRC produces HiFi, ONT, and Illumina Hi-C data. Each data type has a workflow to check data files to ensure they pass QC.
Assemblies are produced with one of two Hifiasm workflows using HiFi and ONT ultralong reads with phasing by either Illumina Hi-C or parental Illumina data for the Hi-C and trio workflows, respectively. The major steps included in the assembly workflows are:
In addition to the Hifiasm workflows there is an assembly cleanup workflow which:
Assemblies are polished using a custom pipeline based around DeepPolisher. The polishing pipeline workflow wdl can be found at polishing/wdl/workflows/hprc_DeepPolisher.wdl
. The major steps in the HPRC assembly polishing pipeline are:
Assembly QC is broken down into two types:
The following tools are included in the standard_qc pipeline:
The following tools are included in the alignment_based_qc pipeline:
If you haven't run a WDL before, there are good resources online to get started. You first need to choose a way to run WDLs. Below are a few options:
Before starting, read the Cromwell 5 minute intro.
Once you've done that, download the latest version of cromwell and make it executable. (Replace XY with newest version number)
wget https://github.com/broadinstitute/cromwell/releases/download/86/cromwell-XY.jar
chmod +x cromwell-XY.jar
And run your WDL:
java -jar cromwell-XY.jar run \
/path/to/my_workflow.wdl \
-i my_workflow_inputs.json \
> run_log.txt
Each workflow requires an input json. You can create a template using womtool:
java -jar womtool-XY.jar \
inputs \
/path/to/my_workflow.wdl \
> my_workflow_inputs.json