Closed jrotieno closed 1 week ago
I agree @AndrewLangvt
@emily-smith1, if testing a pathogen that is not in the TheiaCoV list and running into issues, you may check some notes I made here #509
Tested successfully on a concatenated fasta file containing 420 influenza PB2 gene segments here. We should add a note to our documentation that the user will need to udpate the min_num_unambig input setting if running this on taxa where we don't have organism tracks.
@jrotieno can you make the doc update specified by @emily-smith1? Please LMK and I'll approve & merge this to main
@AndrewLangvt, you can go ahead and merge as I have updated the docs with the required inputs when working with non-TheiaCoV pathogens. I am meeting @cimendes tomorrow to rubber duck some additional Augur doc updates.
This PR closes #458.
🗑️ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR features updates to the Augur_Prep_PHB and Augur_PHB workflows enabling a user to run these workflows with only the sequence information and not requiring associated metadata. The use case is such as when a user needs to only generate a distance tree in the newick format.
At present, in addition to sequence data, one can only run Augur_Prep_PHB when continent, country, state and collection date information is available. After this update, only the sequence information will be required.
For Augur_PHB workflow, in addition to assemblies and build name, associated metadata is a requirement. With this update, only the assemblies and build name will be required.
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No, the user will obtain the same results if they saved the required inputs that will be optional.
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: No
Databases or database versions changed: No
Data processing/commands changed: Yes, made some inputs optional
File processing changed: No
Compute resources changed: No
➡️ Inputs
For
Augur_Prep_PHB
, the following inputs have been made optional:For
Augur_PHB
, the following inputs have been made optional:sample_metadata_tsvs
⬅️ Outputs
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
The following Tests were done with flu H1N1:
Augur_Prep_PHB
with the above mentioned inputs as required: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/82bffbed-ca40-4023-870e-78802528a56fAugur_Prep_PHB
with the above mentioned inputs not included: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/f630f152-89d9-4fbf-b977-c28303a165daAugur_PHB
with metadata: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/88d1aefa-8a4c-4283-b978-bb0d21e7e119Augur_PHB
without metadata (but available from prep): https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/bf835e21-1292-4fe3-a0f1-4e2f9f54ea45Augur_PHB
without metadata (also none from prep): https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/b672ec5f-09c4-451d-88e7-a088c889c3d5The distance trees in all scenarios were the same, with auspice tree only output when metadata is provided.
Suggested Scenarios for Reviewer to Test
Test with non-flu pathogens.
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)