Closed jrotieno closed 4 weeks ago
Apologies for the delay on reviewing, this took quite some time to test across the different organisms and troubleshoot when user error (π¨ β‘οΈ πͺ) occurred. Thank you for your patience.
I have no further code changes to suggest or request β
Thank you for updating the documentation in Notion π
Here are my test workflows and a short description of observed behavior:
organism
= flu
and set the flu_segment
to HA
or NA
and additionally set flu_subtype
to H1N1
or H3N2
. This was not immediately clear to me and I spent time troubleshooting.distance_tree_only
to true. I think it failed previously on augur_refine step because of the very odd makeup of the samples (and their respective qualities) I included in my set. What matters is that the workflow ran, and used the reference materials that are now set by default in the organism parameters subworkflow.
This PR closes #481.
ποΈ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR enables the Augur workflow to be able to analyze RSV-A and RSV-B assemblies. Prior to this PR, the workflow would only process SARS-CoV-2, seasonal Influenza A and B, and Mpox.
The Augur Prep workflow has not been updated as there was no need for changes therein.
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users donβt change any workflow inputs relative to the last version : Yes
Previously, for seasonal Influenza and Mpox, augur reference files and default values were set in the augur_utilities task. With the move to house all organism specific defaults to the organism_parameters workflow, we have now moved the default augur parameters for seasonal Influenza and Mpox to the organism_parameters workflow.
SARS-CoV-2 augur defaults have not been moved yet as the current behaviour is to download the augur defaults directly from nextstrain's "https://github.com/nextstrain/ncov" page. As SARS-CoV-2 clades tsv file is updated quite frequently with new lineages, this is the best strategy. In the future, the other pathogens may also take this direction.
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes, if the augur reference files are updated in the future.
:clipboard: Workflow/Task Step Changes
π Data Processing
Docker/software or software versions changed: No
Databases or database versions changed: No
Data processing/commands changed: Yes, organism default sources moved.
File processing changed: Yes, organism default sources moved.
Compute resources changed: No
β‘οΈ Inputs
Exposed the following optional inputs, used in augur_frequencies, but not currently used in the workflow:
The following optional inputs have been removed:
The following optional inputs are exposed in the workflow but not used for this workflow:
β¬ οΈ Outputs
None changed
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Locally tested using the RSV-A dataset, see Terra testing below, and works as expected.
Terra Testing
As we moved the augur pathogen specific parameters to the organism parameters workflow, we tested all previous and the new pathogens to ensure that those still work.
RSV-A: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/8e750146-a665-4aff-ac35-ca6571c51cd3 RSV-B: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/e58e51e0-267c-4e0c-a9c2-ec01154dfaef H1N1 HA: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/47d664d9-2b13-42d8-b888-e04db1fe2b0b H1N1 NA: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/5335d341-30cb-49a0-8373-a291151e4d2e H3N2 HA: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/471a1ff5-51fb-46df-8656-52ac6bd9f0e3 H3N2 NA: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/90c8555e-c08e-415c-9c8f-4874baf74cdc MPOX: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/83859995-c15a-4606-972c-7568ad9c59f8 SARS-COV-2: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/f9b4e5bc-57e4-4474-a448-31b00d3fa45c
Suggested Scenarios for Reviewer to Test
It was noticed that the augur_refine step fails when the date information does not align with the genetic divergence. For example, the RSV-A sequences were initially given dates in 2023 and would fail this step. However, when the dates were corrected to the appropriate collection dates for the samples in 2013-2014, the workflow was able to run to completion successfully. successful run: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/fa391728-0fa6-448f-9c4a-64cddc60c9ae failed run: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/0b37dff6-5ee5-4bb5-ab11-285018de7ba0
The reviewer can test any additional appropriately dated sequences to see if there are any failures.
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
π― Reviewer Checklist
ποΈ Associated Documentation (to be completed by Theiagen developer)