theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

Automatically convert spaces to dashes in workflows that accept strings #498

Closed AndrewLangvt closed 3 weeks ago

AndrewLangvt commented 4 weeks ago

This PR closes #326 .

πŸ—‘οΈ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

Multiple workflows allow for string input, but when user inputs a string with a space, cromwell engine will fail. As such, this PR resolves this issue across all workflows.

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

:clipboard: Workflow/Task Step Changes

πŸ”„ Data Processing

Docker/software or software versions changed: No

Databases or database versions changed: No

Data processing/commands changed: No

File processing changed: Yes

Compute resources changed: No

➑️ Inputs

Added the following inputs, all of which are provided the base input (strip _updated and that's the basename) and replaces space with underscore Freyja dashboard: freyja_dashboard_title_updated Freyja plot: freyja_plot_name_updated Augur_PHB: build_name_updated Core_Gene_Snp_PHB: cluster_name_updated Find_Shared_Variants_PHB: concatenated_file_name_updated kSNP3_PHB: cluster_name_updated Lyve_SET_PHB: dataset_name_updated MashTree_FASTA_PHB: cluster_name_updated Snippy_Streamline_PHB: tree_name_updated Snippy_Tree_PHB: tree_name_updated Usher_PHB: tree_name_updated Mercury_Prep_N_Batch_PHB: output_name_updated Concatenate_Column_Content_PHB: concatenated_file_name_updated Rename_FASTQ_PHB: new_filename_updated Zip_Column_Content_PHB: zipped_file_name_updated

⬅️ Outputs

None

:test_tube: Testing

Test Dataset

Tests were executed with the same parameters as PHB 2.0 validation for the preponderance of workflows. Those that deviated from this did so simply to cut down on compute costs and a reasonable number of samples were still run

Commandline Testing with MiniWDL or Cromwell (optional)

all workflows passed miniwdl check

Terra Testing

Freyja dashboard: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/7f7fdb9a-9777-46cc-a0b1-bdde0c06cb14 Freyja plot: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/6291431a-f657-414b-abad-2799a1008677 Augur_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/b42878a7-61fd-43c0-b248-e51f9aa675a3 Core_Gene_Snp_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/2aecfc7d-c60b-459d-a6ef-09c5d8c80c66 Find_Shared_Variants_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/4f0276b8-3e8e-4c7a-9fbd-2cafa0bcb7a0 kSNP3_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/3748ad70-6fe6-47de-ac10-2629be23fc76 Lyve_SET_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/0e4d27d3-c3cd-474d-b9ff-c73b4d516783 MashTree_FASTA_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/a0519b5e-9034-4d7b-9d09-2723ba0dcf2b Snippy_Streamline_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/9872d130-5df4-46a0-a8ad-d59f4bba0540 Snippy_Tree_PHB (ran as sub-workflow of Snippy_Streamline_PHB) Usher_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/bc4a5b64-f1d0-441d-a2e1-bc9380cad2d6 Mercury_Prep_N_Batch_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/51cdff4a-b902-4ce4-be29-3a843d765fe8 Concatenate_Column_Content_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/8cb22e65-fdcd-4ad4-99a6-0adeac8a2b0e Rename_FASTQ_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/86562056-9842-4d62-a3cc-1c8506b7089a Zip_Column_Content_PHB: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Lang_Sandbox/job_history/7b40a876-0f67-4691-93bc-c7ab8296c0dc

Suggested Scenarios for Reviewer to Test

For each of the workflows, provide a string with spaces to the input.

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

πŸ—‚οΈ Associated Documentation (to be completed by Theiagen developer)

kapsakcj commented 3 weeks ago

It would be good to go through the documentation and remove the mentions of No whitespaces! or whatever message that listed in the workflow input parameter descriptions. I see it on Freyja_dashboard, and I'm assuming it might be mentioned in the input param tables for other workflows too.

This PR will not require updating workflow diagrams, so I checked that off

kapsakcj commented 3 weeks ago

Very nice work on your 2nd PR πŸŽ‰

I tested all workflows on independent datasets and intentionally put spaces in all the string inputs that were changed by this PR. Just a couple of spots where the updated variable names were missing, but those have been resolved. After those quick updates, everything ran great. Workflows tested:

Please let me know when you are finished updating the documentation and I'll approve and merge. Thanks again for the changes, this will save everyone lots of time in preventing user error and troubleshooting

AndrewLangvt commented 3 weeks ago

@kapsakcj documentation is now updated for this PR

kapsakcj commented 3 weeks ago

@AndrewLangvt Unless you know someone is actively using the dev branch, I'd recommend deleting it

PR merged, thank you!