Closed sage-wright closed 4 months ago
Code changes look good! Testing here:
BaseSpace_Fetch preemptible no longer exposed ✅
Rasusa
Disabling PlasmidFinder ✅
As per the tool documentation, genome length can be either an integer or have a metric suffix, so I think we should keep it as a String
Given the general context of PHB, we always set the genome length as int so I think it would be easier to document and transmit to our users the same standard! What do you think?
I think that warrants further discussion! @kapsakcj and @jrotieno, any opinions?
I would prefer to keep it as a String because of what Rasusa can accept (and the usage is bit more flexible this way)
Plus we have documentation surrounding the use of strings like 5m
for this particular workflow
And there are users who are now accustomed to providing these strings as input (all of the labs that have done LOD studies)
EDIT: I agree this does make TheiaProk_ONT input parameters differ from the other TheiaProk workflows.... hmmmmm 🤔
EDIT2: I think we should keep as is, where in RASUSA_PHB standalone workflow the genome_size
input is a String. And in TheiaProk_ONT_PHB, the genome_size
input is kept as an Int (which is then fed into the read_qc_ont subworkflow and passed into Rasusa task)
On one hand I am thinking String is versatile to allow both a string and integer input, while on the other I am not sure that we want to encourage people to input 5Kb instead of 5000 even if the tool allows it.
I guess this settles it! It shall remain as is! :D Approving the PR and merging!
This PR closes #280, closes #370 and closes #359.
🗑️ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR does three things:
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes (call caching off on RASUSA)
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes (RASUSA is random, but we only want to test to see if call caching is actually off
call_plasmidfinder
(default=true
) is added to all TheiaProk workflows:clipboard: Workflow/Task Step Changes
🔄 Data Processing
No Docker/software or software versions changed: No Databases or database versions changed: No Data processing/commands changed: No File processing changed: No Compute resources changed: - preemptible is now hard-coded to 1 in BaseSpace_Fetch - volatile:true turns off call caching in RASUSA #### ➡️ InputsTheiaProk workflows now have the
call_plasmidfinder
optional workflow input. True by default, no change will be noticed by users of TheiaProk unless they set the value of that variable to false.⬅️ Outputs
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)