theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

Augur Updates for RSV-A and RSV-B #478

Closed jrotieno closed 4 weeks ago

jrotieno commented 1 month ago

This PR closes #481.

πŸ—‘οΈ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

This PR enables the Augur workflow to be able to analyze RSV-A and RSV-B assemblies. Prior to this PR, the workflow would only process SARS-CoV-2, seasonal Influenza A and B, and Mpox.

The Augur Prep workflow has not been updated as there was no need for changes therein.

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes

Previously, for seasonal Influenza and Mpox, augur reference files and default values were set in the augur_utilities task. With the move to house all organism specific defaults to the organism_parameters workflow, we have now moved the default augur parameters for seasonal Influenza and Mpox to the organism_parameters workflow.

SARS-CoV-2 augur defaults have not been moved yet as the current behaviour is to download the augur defaults directly from nextstrain's "https://github.com/nextstrain/ncov" page. As SARS-CoV-2 clades tsv file is updated quite frequently with new lineages, this is the best strategy. In the future, the other pathogens may also take this direction.

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes, if the augur reference files are updated in the future.

:clipboard: Workflow/Task Step Changes

πŸ”„ Data Processing

Docker/software or software versions changed: No

Databases or database versions changed: No

Data processing/commands changed: Yes, organism default sources moved.

File processing changed: Yes, organism default sources moved.

Compute resources changed: No

➑️ Inputs

Exposed the following optional inputs, used in augur_frequencies, but not currently used in the workflow:

augur   min_date    Float
augur   pivot_interval  Int
augur   proportion_wide Float
augur   narrow_bandwidth    Float

The following optional inputs have been removed:

flu_defaults    cpu Int
flu_defaults    disk_size   Int
flu_defaults    docker  String
flu_defaults    flu_lat_longs_tsv   File
flu_defaults    memory  Int
mpxv_defaults   cpu Int
mpxv_defaults   disk_size   Int
mpxv_defaults   docker  String
mpxv_defaults   memory  Int
mpxv_defaults   mpxv_auspice_config File
mpxv_defaults   mpxv_clades_tsv File
mpxv_defaults   mpxv_lat_longs_tsv  File
mpxv_defaults   mpxv_reference_fasta    File
mpxv_defaults   mpxv_reference_genbank  File

The following optional inputs are exposed in the workflow but not used for this workflow:

organism_parameters gene_locations_bed_file File
organism_parameters genome_length_input Int
organism_parameters hiv_primer_version  String
organism_parameters kraken_target_organism_input    String
organism_parameters nextclade_dataset_name_input    String
organism_parameters nextclade_dataset_tag_input String
organism_parameters pangolin_docker_image   String
organism_parameters primer_bed_file File
organism_parameters reference_gff_file  File
organism_parameters vadr_max_length Int
organism_parameters vadr_mem    Int
organism_parameters vadr_options    String
organism_parameters vadr_skip_length    Int

⬅️ Outputs

None changed

:test_tube: Testing

Test Dataset

Commandline Testing with MiniWDL or Cromwell (optional)

Locally tested using the RSV-A dataset, see Terra testing below, and works as expected.

Terra Testing

As we moved the augur pathogen specific parameters to the organism parameters workflow, we tested all previous and the new pathogens to ensure that those still work.

RSV-A: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/8e750146-a665-4aff-ac35-ca6571c51cd3 RSV-B: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/e58e51e0-267c-4e0c-a9c2-ec01154dfaef H1N1 HA: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/47d664d9-2b13-42d8-b888-e04db1fe2b0b H1N1 NA: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/5335d341-30cb-49a0-8373-a291151e4d2e H3N2 HA: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/471a1ff5-51fb-46df-8656-52ac6bd9f0e3 H3N2 NA: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/90c8555e-c08e-415c-9c8f-4874baf74cdc MPOX: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/83859995-c15a-4606-972c-7568ad9c59f8 SARS-COV-2: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/f9b4e5bc-57e4-4474-a448-31b00d3fa45c

Suggested Scenarios for Reviewer to Test

It was noticed that the augur_refine step fails when the date information does not align with the genetic divergence. For example, the RSV-A sequences were initially given dates in 2023 and would fail this step. However, when the dates were corrected to the appropriate collection dates for the samples in 2013-2014, the workflow was able to run to completion successfully. successful run: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/fa391728-0fa6-448f-9c4a-64cddc60c9ae failed run: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/0b37dff6-5ee5-4bb5-ab11-285018de7ba0

The reviewer can test any additional appropriately dated sequences to see if there are any failures.

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

πŸ—‚οΈ Associated Documentation (to be completed by Theiagen developer)

kapsakcj commented 4 weeks ago

Apologies for the delay on reviewing, this took quite some time to test across the different organisms and troubleshoot when user error (πŸ‘¨ ➑️ πŸͺž) occurred. Thank you for your patience.

I have no further code changes to suggest or request βœ…

Thank you for updating the documentation in Notion πŸ‘

Here are my test workflows and a short description of observed behavior: