theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

[TheiaCoV_Illumina_PE & _ONT] Create sub-workflow for flu-specific modules #502

Closed sage-wright closed 1 week ago

sage-wright commented 2 weeks ago

This PR closes #409.

*** Merge after PR #468

🗑️ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

Flu is a different beast due to its segmentation. The amount of flu-specific modules were growing to consume most of the workflow files for TheiaCoV_Illumina_PE and _ONT. In addition, the Nextclade outputs for HA did not have the _HA suffix, which led to confusion.

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes but no

Outputs will remain the same, but the processing of the flu-specific tasks will occur in a different subworkflow

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

The following tasks were moved to the flu_track subworkflow:

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed:

Databases or database versions changed:

Data processing/commands changed:

File processing changed:

Compute resources changed:

➡️ Inputs

⬅️ Outputs

New outputs:

Changed outputs:

:test_tube: Testing

Test Dataset

Commandline Testing with MiniWDL or Cromwell (optional)

Terra Testing

ONT test here with various organisms] Illumina PE test here with various organisms

Suggested Scenarios for Reviewer to Test

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

cimendes commented 1 week ago

Code changes are solid 🏅 Testing after #468 is merged

kapsakcj commented 1 week ago

My tests here, still need to review outputs:

log file:

nextclade_ordered_writer.rs:171: In sequence #0 'SRR13439799_A_HA_H1': When processing sequence #0 'SRR13439799_A_HA_H1': When calculating seed matches: Unable to align: seed alignment covers 9.73% of the query sequence, which is less than expected 10.00% (configurable using 'min seed cover' CLI flag or dataset property). This is likely due to low quality of the provided sequence, or due to using incorrect reference sequence.. Note that this sequence will not be included in the results.

non-flu:

kapsakcj commented 1 week ago

Curtis TODOs: