theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

Augur_PHB: Set sample_metadata_tsvs input to optional #503

Closed jrotieno closed 1 week ago

jrotieno commented 2 weeks ago

This PR closes #458.

🗑️ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

This PR features updates to the Augur_Prep_PHB and Augur_PHB workflows enabling a user to run these workflows with only the sequence information and not requiring associated metadata. The use case is such as when a user needs to only generate a distance tree in the newick format.

At present, in addition to sequence data, one can only run Augur_Prep_PHB when continent, country, state and collection date information is available. After this update, only the sequence information will be required.

For Augur_PHB workflow, in addition to assemblies and build name, associated metadata is a requirement. With this update, only the assemblies and build name will be required.

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No, the user will obtain the same results if they saved the required inputs that will be optional.

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed: No

Databases or database versions changed: No

Data processing/commands changed: Yes, made some inputs optional

File processing changed: No

Compute resources changed: No

➡️ Inputs

For Augur_Prep_PHB, the following inputs have been made optional:

collection_date
country
state
continent

For Augur_PHB, the following inputs have been made optional: sample_metadata_tsvs

⬅️ Outputs

:test_tube: Testing

Test Dataset

Commandline Testing with MiniWDL or Cromwell (optional)

Terra Testing

The following Tests were done with flu H1N1: Augur_Prep_PHB with the above mentioned inputs as required: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/82bffbed-ca40-4023-870e-78802528a56f

Augur_Prep_PHB with the above mentioned inputs not included: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/f630f152-89d9-4fbf-b977-c28303a165da

Augur_PHB with metadata: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/88d1aefa-8a4c-4283-b978-bb0d21e7e119

Augur_PHB without metadata (but available from prep): https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/bf835e21-1292-4fe3-a0f1-4e2f9f54ea45

Augur_PHB without metadata (also none from prep): https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/b672ec5f-09c4-451d-88e7-a088c889c3d5

The distance trees in all scenarios were the same, with auspice tree only output when metadata is provided.

Suggested Scenarios for Reviewer to Test

Test with non-flu pathogens.

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

jrotieno commented 1 week ago

I agree @AndrewLangvt

@emily-smith1, if testing a pathogen that is not in the TheiaCoV list and running into issues, you may check some notes I made here #509

emily-smith1 commented 1 week ago

Tested successfully on a concatenated fasta file containing 420 influenza PB2 gene segments here. We should add a note to our documentation that the user will need to udpate the min_num_unambig input setting if running this on taxa where we don't have organism tracks.

AndrewLangvt commented 1 week ago

@jrotieno can you make the doc update specified by @emily-smith1? Please LMK and I'll approve & merge this to main

jrotieno commented 1 week ago

@AndrewLangvt, you can go ahead and merge as I have updated the docs with the required inputs when working with non-TheiaCoV pathogens. I am meeting @cimendes tomorrow to rubber duck some additional Augur doc updates.