theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
36 stars 17 forks source link

TBProfiler_tNGS_PHB: Introduction of tNGS workflow for TB #272

Closed sage-wright closed 4 months ago

sage-wright commented 8 months ago

This PR closes #276 by introducing the TBProfiler_tNGS_PHB workflow, designed for Illumina PE tNGS data.

🗑️ This dev branch should NOT be deleted after merging to main.

:brain: Aim, Context and Functionality

tNGS is being used to analyze Mycobacterium tuberculosis data for clinical usage. Targeted sequence requires different analysis approaches to WGS, which means that TheiaProk workflows cannot be used as they are intended to create an assembled genome. Since this data is fragmented and amplicon-based, creating an assembly is a bad idea.

TBProfiler_tNGS_PHB is our solution: a workflow that performs minimal QC and runs TBProfiler and tbp-parser by default.

The minimal QC performed is as follows:

clockwork is currently not implemented due to difficult to resolve issues experienced during implementation of the tool.

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed: tbp-parser has been updated to v1.3.0 which includes tNGS compatibility via the inclusion of the tNGS primer region bed file.

Databases or database versions changed: No database changes.

Data processing/commands changed: A new input parameter trimmomatic_base_crop is added to the trimmomatic_pe task. This Integer variable, if provided, will trigger calculation of average read length and creation of new parameters for the trimmomatic task, specifically: HEADCROP and CROP.

Average read length is used to determine the CROP value dynamically; the trimmomatic_base_crop value will be removed from the average read length. HEADCROP is set to equal trimmomatic_base_crop.

No other analysis changes have been made to TBProfiler and tbp-parser (other than updated tbp-parser version, description available in tbp-parser repository).

File processing changed: No file processing changes.

Compute resources changed: No compute resources changes.

➡️ Inputs

All inputs are new because this is a new workflow.

New required inputs:

New optional inputs for tbp_parser task:

New optional inputs for tbprofiler task:

New optional inputs for tbprofiler_tngs workflow:

New optional inputs for trimmomatic_pe task:

New optional inputs for version_capture task:

⬅️ Outputs

All outputs are new because this is a new workflow.

New outputs (in alphabetical order):

:test_tube: Testing

Test Dataset

Command-line Testing with MiniWDL or Cromwell (optional)

Terra Testing

Suggested Scenarios for Reviewer to Test

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)