theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

update default pangolin docker (pdata 1.27) & nextclade dataset tags for SC2, Flu, Mpox #521

Closed kapsakcj closed 1 week ago

kapsakcj commented 1 week ago

This PR closes #454

🗑️ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

To update the default docker image for pangolin

& to update the default nextclade dataset tags to the latest available at the current time

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

Changes were made to the organism_parameters.wdl subworkflow

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed: same pangolin version, pangolin-data updated to v1.27

Databases or database versions changed: updated dataset tags for SC2, Mpox, various Flu types & subtypes

Data processing/commands changed: N/A

File processing changed: N/A

Compute resources changed: N/A

➡️ Inputs

N/A

⬅️ Outputs

:test_tube: Testing

Test Dataset

sars-cov-2

Flu A and B samples

Mpox

Commandline Testing with MiniWDL or Cromwell (optional)

Not tested locally

Terra Testing

⚠️ Need to review outputs for all of these ^

Suggested Scenarios for Reviewer to Test

Test the various organisms through various TheiaCoV workflows (or pangolin_update)

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

kapsakcj commented 1 week ago

@sage-wright FYI - I noticed that TheiaCoV_FASTA_Batch wasn't properly using organism_parameters.nextclade_dataset_tag output string as input for the final step in the workflow used for updating the sample-level data table.

Previously the workflow could only use the user-provided optional input nextclade_dataset_tag so if not provided by user, the value would be null and the sample-level data table would not be updated with the dataset tag that was actually used to run nextclade on the samples during the workflow

See last commit for the code update to resolve this.

Re-testing theiacov_fasta_batch here: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/7e08d7ed-2e5a-42c1-b9b9-4a0ce63ad38c

⚠️ I need to review the outputs of this test ^

EDIT: ✅ Success! this commit resolved this issue. See the screenshot below of TSV files used for updating the sample-level data table before and after the code change: image

cimendes commented 1 week ago

you are a ⭐