theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
34 stars 16 forks source link

theiacov_fasta_batch_PHB wf improvements #319

Closed kapsakcj closed 5 months ago

kapsakcj commented 6 months ago

removed required string inputs seq_method and input_assembly_method since they are not used in the workflow. They are not required.

This PR closes #320

🗑️ This dev branch should NOT be deleted after merging to main (one potential PHL user)

:brain: Aim, Context and Functionality

described in issue #320

These changes are to ease the usage of this workflow.

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes, seq_method and input_assembly_method will no longer be updated/overwritten after running this workflow

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing: No, results should not change if identical versions of pangolin or nextclade (and their respective databases) are used between workflow runs

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

seq_method and input_assembly_method are no longer an input, they are removed entirely. This also means they are no longer output (i.e. those output columns are no longer updated/overwritten)

Docker/software or software versions changed: No

Databases or database versions changed: No

Data processing/commands changed: No

File processing changed: Yes

Compute resources changed: No

➡️ Inputs

seq_method and input_assembly_method are no longer an input, they are removed entirely.

⬅️ Outputs

seq_method and input_assembly_method are no longer output (i.e. those output columns are no longer updated/overwritten)

:test_tube: Testing

Test Dataset

I tested with a batch of ~33 SC2 genomes (Omicron and decendant lineages) downloaded from GISAID.

Commandline Testing with MiniWDL or Cromwell (optional)

only tested via Terra, since it's difficult or impossible to test locally since the workflow involves updating Terra data tables

Terra Testing

successful workflow here: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/610e9035-24d5-40a7-b22b-70668ff4a4eb

data table is called omicron_pango_validation

Suggested Scenarios for Reviewer to Test

Would be good for the reviewer to test a large batch of genomes (300+). Should run OK. I'm just curious about how many we could run at once (1000? 10000? more??)

Theiagen Version Release Testing (optional)

-Will changes require functional or validation testing (checking outputs etc) during the release? functional -Do new samples need to be added to validation datasets? If so, upload these to the appropriate validation workspace Google bucket (). Please describe the new samples here and why these have been chosen. No -Are there any output files that should be checked after running the version release testing? Check the final output TSV used to update the Terra data table

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

kapsakcj commented 5 months ago

Docs have been updated, I just deleted the 2 inputs from the table in the docs. No need to update workflow diagram (if it even exists for theiacov_fasta_batch?)

sage-wright commented 5 months ago

Code changes look great, currently testing here