theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

Exposing r1 and r2 mean_q_clean and mean_readlength_clean #455

Closed jrotieno closed 1 month ago

jrotieno commented 2 months ago

This PR closes #439 .

🗑️ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

This PR exposes the mean quality scores for reads 1 and 2, i.e. , r1_mean_q_clean and r1_mean_q_clean, and mean clean readlengths for reads 1 and 2, i.e. r1_mean_readlength_clean and r2_mean_readlength_clean. These outputs were computed by the TheiaProk Illumina PE workflow but not exposed on Terra.

For TheiaProk ONT workflow, I don't know if we want to change the outputs such as nanoplot_r1_mean_q_clean to r1_mean_q_clean for coherence with the Illumina and SE workflows. Rationale being that for PE and SE, we do not prefix with cg_pipeline that generates the metrics whereas this is done with nanoplot for ONT

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes, new outputs

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed: No

Databases or database versions changed: No

Data processing/commands changed: No

File processing changed: No

Compute resources changed: No

➡️ Inputs

⬅️ Outputs

r1_mean_q_clean r2_mean_q_clean r1_mean_readlength_clean r2_mean_readlength_clean

:test_tube: Testing

Test Dataset

A random set of two V. cholerae samples

Commandline Testing with MiniWDL or Cromwell (optional)

Terra Testing

TheiaProk Illumina PE: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/0ea0da1f-926d-459c-88d7-e90084f86a92

Suggested Scenarios for Reviewer to Test

This is pretty straightforward and does not need extensive testing, but the reviewer may test a scenario when the clean reads screen is expected to fail, cg_pipeline_clean is not run, and these outputs should not have any values.

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

sage-wright commented 1 month ago

Testing ONT here and PE here; code changes look good, will approve upon successful completion & output propagation.