Closed kapsakcj closed 6 months ago
I created a clone of the PHB_Validation_TEMPLATE workspace specifically for testing these changes: https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history
Marking as ready for review to get the review process started.
I'm working on pulling a couple RSV-A and RSV-B genomes for testing (since the 2 I tested with didn't so well - failed to align well with the nextclade references) so I will test these and post a workflow link when it's completed.
2 RSV-A and 2 RSV-B samples tested via TheiaCoV_FASTA here: https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history/1da2324d-436e-4b60-a79e-04bceb62a6f7
.gitignore
has been updated to ignore common outputs from miniwdl 2024*
and _LAST
as well as cromwell*
which ignores all cromwell run
outputsDetails for days! That was a beast to review. Thanks for all of the documentation and specific workspace for validation runs. I've combed through all of your runs and feel confident in these code changes; also launched a few runs myself within the same workspace for sanity's sake.
Just two points to address:
workflows/phylogenetics/wf_nextclade_addToRefTree.wdl
. It's the same underlying changes being made so I don't anticipate any functional difference, but I would like to see a successful fun before merging this PR.Once these are resolved we can move forward with a merge!
Thanks for the thorough review, kevin.
_Made a few comments regarding input name changes/removals that seem a bit out of sync with respect to nextclade v3 migration guidance -- and because this will only impact optional parameters, I think we can verify things in running a single workflow rather than needing to re-run the whole testing dataset. I don't see a functional test for workflows/phylogenetics/wf_nextcladeaddToRefTree.wdl. It's the same underlying changes being made so I don't anticipate any functional difference, but I would like to see a successful fun before merging this PR.
I'll update CI first, then...
I'll launch a function test of the nextclade_addToRefTree workflow as well as TheiaCoV_FASTA_PHB on the various organisms in our validation dataset to ensure everything runs as expected.
EDIT:
OK everything should be resolved, I think we are ready for approval and merge
So FYI the nextclade run --input-pcr-primers
option is available in v3.1.0 and above: https://github.com/nextstrain/nextclade/issues/1432#issuecomment-2037700068
Should we add it back to the WDL task or leave as is? I don't know of any of our users that utilize this option, so I'm tempted to leave it out.
So FYI the
nextclade run --input-pcr-primers
option is available in v3.1.0 and above: nextstrain/nextclade#1432 (comment)Should we add it back to the WDL task or leave as is? I don't know of any of our users that utilize this option, so I'm tempted to leave it out.
That's fair. Let's get this closed and avoid scope creep; changes made in this PR address #349 for PHB users. Can you open a new issue for us to add that option back and we can discuss internally when to prioritize things
Still have lots of work to do, but wanted to at least start a draft PR so folks can see progress.This PR closes #349
🗑️ This dev branch should be deleted after merging to main. (subject to change)
:brain: Aim, Context and Functionality
This PR updates all TheiaCov workflows to use nextclade v3.
A new task was added for nextclade v3 specifically.
CI has undergone major updates to account for these changes AND to re-enable the theiacov_illumina_pe and theiacov_illumina_se CI workflows from running (they were disabled back in July 2023)
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
All TheiaCov workflows are impacted across all organisms (sars-cov-2, Mpox, Flu, RSV-A, RSV-B) that we support and run through nextclade
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : **Yes***
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: upgraded to
us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.3.1
which was originally copied from Nextstrain's image on dockerhubDatabases or database versions changed: All nextclade_dataset_tags have been updated for all organisms
Data processing/commands changed: Details on changes from v2 ➡️ v3 can be found here: https://docs.nextstrain.org/projects/nextclade/en/stable/user/migration-v3.html#dataset-file-format-and-dataset-names-have-changed
File processing changed: The nextclade_output_parsing has not changed. The output TSV of nextclade v3 was readly parsed by the existing WDL task
Compute resources changed: No
➡️ Inputs
String? nextclade_dataset_reference
has been removed from all TheiaCov workflows (and the organism_param subwf) as it is not longer requiredString sc2_nextclade_ds_name = "nextstrain/sars-cov-2/wuhan-hu-1/orfs"
is now a path-like name instead of the names like "sars-cov-2". Those names are still kept as shortcuts, but for futureproofing sake - I have changed these to the new path-like names for all organisms⬅️ Outputs
nextclade_v3
task outputs instead of the oldnextclade
task.:test_tube: Testing
Test Dataset
I am testing in Terra using our validation datasets that we used to validate new versions of PHB.
Commandline Testing with MiniWDL or Cromwell (optional)
Not going to show cmdline testing, everything will be tested in Terra
Terra Testing
nextclade_dataset_tag
✅nextclade_dataset_tag
✅TODO pull a couple RSV-A and RSV-B samples to test withDoneSuggested Scenarios for Reviewer to Test
Need to test all organisms to ensure functionality exists for all organisms.
It seems a bit extreme, but I'd like to test all impacted workflows since I'd rather catch bugs now, instead of the 11th hour before the v2.0.0 release. SO I've created a terra workspace specifically for this testing https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)