theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
34 stars 16 forks source link

TheiaCoV and TheiaProk workflows standardization & organism default subworkflow #310

Closed jrotieno closed 5 months ago

jrotieno commented 6 months ago

Closes #290

:hammer_and_wrench: Changes Being Made

We are making changes to variable names for standardization across TheiaCoV and TheiaProk workflows, and associated tasks. Specifically, we have made the following changes:

Previous Name New Name
read1_raw read1
read2_raw read2
genome_size genome_length
min_genome_size min_genome_length
max_genome_size max_genome_length
expected_genome_size expected_genome_length
estimated_genome_size estimated_genome_length
genome_size_output genome_length_output
kmc_est_genome_size kmc_est_genome_length
target_org target_organism
percent_target_org percent_target_organism

In addition, the organism_parameters logic workflow has been implemented that automatically sets organism-specific parameters for TheiaCoV organisms.

Impacted Workflows/Tasks

task_dragonflye.wdl task_shovill.wdl task_nanoplot.wdl task_screen.wdl task_kraken2.wdl task_broad_terra_tools.wdl task_kmc.wdl task_rasusa.wdl wf_theiacov_illumina_pe.wdl wf_theiacov_illumina_se.wdl wf_theiacov_ont.wdl wf_theiaprok_illumina_pe.wdl wf_theiaprok_illumina_se.wdl wf_theiaprok_ont.wdl wf_read_QC_trim_ont.wdl wf_read_QC_trim_pe.wdl wf_read_QC_trim_se.wdl

all TheiaCoVs, Freyja_FASTQ, etc.

:brain: Context and Rationale

As code base expands, inconsistencies arise that need standardization across old and new workflows. Such inconsistencies could include the use of a variable name that is less intuitive, e.g. genome length vs genome size, the use of two variable names in two workflows for the same input type, e.g. read1 vs read1_raw, etc.

For the organism parameter, this will reduce the number of static entries in the input JSONs so those only contain the variable parameters that change from week to week, more closely matching their intended purpose.

:clipboard: Workflow/Task Steps

Inputs

Change of input names, no new inputs

Outputs

Change of output names, no new outputs

Impacted Outputs

Not changed

:test_tube: Testing

Picked random samples recently used for PHB 1.3.0 release validation.

Locally

Not undertaken

Terra

TheiaCoV Illumina PE: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/55978b2f-9e30-456c-b718-cf317e0ca339

TheiaCoV Illumina SE: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/1cad7bab-0b38-4f8b-b888-db1a4506ff0e

TheiaCoV ONT: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/05417f4f-c9f9-43f2-be06-c7d069e9b8b6

TheiaProk Illumina PE: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/e5828caf-e9d7-47f8-8f34-e1e18f6fa5ed

TheiaProk Illumina SE: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/ca2daa9c-8685-48de-abaa-d58c952daa4b

TheiaProk ONT: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/964bc4a0-d0bc-4c38-99b4-6d25cd6ea6ec

Scenarios for Reviewer to Test

:microscope: Quality checks

Pull Request (PR) checklist:

kapsakcj commented 5 months ago

I just merged origin/main back into this branch to resolve the conflict with TheiaMeta workflow WDL. See the above commit: 0f8b004

kapsakcj commented 5 months ago

All the code changes LGTM. I had a few questions but regardless I can start testing in Terra early next week

kapsakcj commented 5 months ago

Just merged in main which includes Ines' PR240 on Kraken & ONT data as well as mine/James' PR275 on unaligned reads out of TheiaCov Illumina workflows

I expect CI to fail and I'll update shortly.

@cimendes It would be good to double check my work on resolving conflicts regarding the kraken task & ONT readQC subworkflow. There were a lot of places where I changed target_org to target_organism in both WDL and bash variables and I hope I updated everything correctly 🤞

cimendes commented 5 months ago

@kapsakcj I've resolved the conflicts and will update hashes once all the tests terminate :)

cimendes commented 5 months ago

Functional and TheiaValidate validation performed in https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_PR_310

kapsakcj commented 5 months ago

TheiaEuk_Illumina_PE functional test here (launched after all of these commits were made): https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_PR_310/job_history/79d3dc36-da4f-470a-af59-ccfd085ae5cc

waiting for it to finish, so need to review results

kapsakcj commented 5 months ago

we cannot forget to update the documentation with this PR change