Closed kapsakcj closed 5 months ago
Reminder: tag Ines when ready for review
I ran one final test of kraken2_SE_PHB workflow after making the last commit to update the CI: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/46106d91-fe26-4a13-8456-ddb27ca6743a
It allocated a VM with 500GB as I requested, so it ran as expected β
@cimendes Would be great if you could test TheiaMeta_Illumina_PE (when you have time of course, this is lower priority)
Just realized that this change also impacts the readQC subworkflows as well...
workflows/utilities/wf_read_QC_trim_pe.wdl
workflows/utilities/wf_read_QC_trim_se.wdl:
both of these subworkflows call the kraken2_standalone
task located in tasks/taxon_id/task_kraken2.wdl
so I'll run at least one theiaprok_illumina_PE_PHB workflow with the call_kraken
boolean enabled
and just realized that I have to add this option to the read_QC subworkflows as well....
commit incoming. Maybe hold off on the review for a moment
OK here's a theiaprok_illumina_pe_PHB test where I set the new optional inputs (for read_QC_pe subworkflow) kraken_disk_size
to 500 and kraken_memory
to 64: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/workflows/theiagen-validations/TheiaProk_Illumina_PE_PHB
EDIT: ran as expected and allocated a VM with 500GB disk and 64GB memory β
and here's a test with TheiaProk_Illumina_SE_PHB with kraken_disk_size
set to 250 and kraken_memory
set to 64: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/3ed3bae5-ba78-47d1-b75b-1c99131fdeda
EDIT: ran as expected and allocated a VM with 250GB disk and 64GB memory β
Tests already performed:
Workflows to be tested:
TheiaMeta_Illumina_PE:
New input variables are present as expected:
β Run test successful: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/24bbace4-3074-4c66-883c-aa2fba93bfaf
Kraken2_PE:
New input parameter present as expected:
β Successful run: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/785c725e-8d55-4327-9bc8-a324b5ec2ad5
TheiaProk_Illumina_PE:
Inputs present as expected (pity WDL is depressing accessing these optional inputs directly through the Kraken2 task.. workflows in workflows π€·π»)
β Successful run: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/3968fa05-d2f6-4693-b7df-4833a095ff79
Last but not least, documentation!!
All is looking good! Great job! I'm approving the PR!
This PR closes #315
ποΈ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
[copied from GH issue]
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users donβt change any workflow inputs relative to the last version: No
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
This code change affects both standalone kraken2 workflows:
:clipboard: Workflow/Task Step Changes
π Data Processing
Nothing has changed about the way the workflow runs
Docker/software or software versions changed: No
Databases or database versions changed: No
Data processing/commands changed: No
File processing changed: No
Compute resources changed: Yes, user can now alter disk_size from the default of
100
GBβ‘οΈ Inputs
None
β¬ οΈ Outputs
None
:test_tube: Testing
Test Dataset
tested on one ILMN PE bacterial sample (see link to test Terra workflow below), and requested
500
as thedisk_size
for the task, and Terra/cromwell did use a VM with a disk size of 500 GB πCommandline Testing with MiniWDL or Cromwell (optional)
Unnecessary, only test in Terra.
Terra Testing
successful workflow here: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/dccf8cf5-ab64-4df6-bad8-9cbdd9fd718f
proof that the VM used a 500GB disk:![image](https://github.com/theiagen/public_health_bioinformatics/assets/8172086/fd859d54-420d-4c1b-9900-38f6ce8c943a)
Suggested Scenarios for Reviewer to Test
Would be good to test out with large kraken2 databases. My test uses the "kraken2 standard" database which is 38GB compressed and estimated 70-ish GB when uncompressed.
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
π― Reviewer Checklist
ποΈ Associated Documentation (to be completed by Theiagen developer)