Closed kapsakcj closed 2 months ago
@kapsakcj any hesitation in taking this out of draft state? Changes are looking pretty solid to me.
I can mark it ready for review, but I haven't finished testing & reviewing outputs. Only ran TheiaProk_FASTA workflow linked above, haven't tested the other workflows yet.
I would recommend testing TheiaEuk to confirm it still works as intended for eukaryotes before merging.
Testing time!
TheiaProk_Illumina_PE: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/258f14a5-4342-42f9-9b6e-95b373561780 β
TheiaProk_Illumina_SE: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/60000306-27a5-41ec-811e-1dedd7077ca3 β
TheiaEuk_Illumina_PE: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/3165f7fc-732c-4355-b299-2607f4a597f8 β οΈ
@kapsakcj BUSCO keeps failing on TheiaEuk π’ It "fails successfully" so it's hard for me to understand why it's so unhappy. I'll try to dig a bit and I shall report back!
I just did a retry on the workflow for theiaeuk, setting the memory for 16GB -> https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/98834755-7d1c-43a3-aab3-fc0e66c53c03
Testing TheiaEuk with 3 Candida auris genomes here, now that the default RAM is set to 24GB for TheiaEuk specifically: https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history/b03b3a98-99a5-443e-b15a-9fd879f56b6d
BUSCO ran successfully (without memory failure) with the new default of 24GB.
I think we are good to merge?
This PR closes #345
ποΈ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
Update BUSCO to the latest available version
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users donβt change any workflow inputs relative to the last version : Yes, BUSCO task auto-downloads their database at runtime and it is periodically updated (not sure how often but last update for enterobacteriales db was 2024-01-08
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes
Impacted workflows:
:clipboard: Workflow/Task Step Changes
π Data Processing
Docker/software or software versions changed: upgraded to use a Theiagen-hosted copy of the ezlabgva (authors) docker image
us-docker.pkg.dev/general-theiagen/ezlabgva/busco:v5.7.1_cv1
Databases or database versions changed: Database changes without warning
Data processing/commands changed: added
-cpu
option to mainbusco
commandFile processing changed: adjustments to parsing of output files; see code for details
Compute resources changed: none
β‘οΈ Inputs
β¬ οΈ Outputs
Added
String busco_docker
output to WDL taskTODO:
:test_tube: Testing
Test Dataset
Will update later, but will likely test across a diverse set of bacterial species and at least one eukaryotic pathogen (candida auris?)
Commandline Testing with MiniWDL or Cromwell (optional)
Tested the WDL task changes locally:
Will test workflows in Terra after code has been updated
Terra Testing
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
π― Reviewer Checklist
ποΈ Associated Documentation (to be completed by Theiagen developer)