theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

[TheiaProk] Enable plasmidfinder to be skipped #374

Closed sage-wright closed 4 months ago

sage-wright commented 4 months ago

This PR closes #280, closes #370 and closes #359.

🗑️ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

This PR does three things:

  1. Provides a user input for users to turn off plasmidfinder
  2. Hides the preemptible input parameter for BaseSpace_Fetch (no testing required except for checking the inputs of the workflow on Terra)
  3. Disables call caching on RASUSA

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes (call caching off on RASUSA)

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes (RASUSA is random, but we only want to test to see if call caching is actually off

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

No Docker/software or software versions changed: No Databases or database versions changed: No Data processing/commands changed: No File processing changed: No Compute resources changed: - preemptible is now hard-coded to 1 in BaseSpace_Fetch - volatile:true turns off call caching in RASUSA #### ➡️ Inputs

TheiaProk workflows now have the call_plasmidfinder optional workflow input. True by default, no change will be noticed by users of TheiaProk unless they set the value of that variable to false.

⬅️ Outputs

:test_tube: Testing

Test Dataset

Commandline Testing with MiniWDL or Cromwell (optional)

Terra Testing

Suggested Scenarios for Reviewer to Test

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

cimendes commented 4 months ago

Code changes look good! Testing here:

sage-wright commented 4 months ago

As per the tool documentation, genome length can be either an integer or have a metric suffix, so I think we should keep it as a String

cimendes commented 4 months ago

Given the general context of PHB, we always set the genome length as int so I think it would be easier to document and transmit to our users the same standard! What do you think?

sage-wright commented 4 months ago

I think that warrants further discussion! @kapsakcj and @jrotieno, any opinions?

kapsakcj commented 4 months ago

I would prefer to keep it as a String because of what Rasusa can accept (and the usage is bit more flexible this way)

Plus we have documentation surrounding the use of strings like 5m for this particular workflow

And there are users who are now accustomed to providing these strings as input (all of the labs that have done LOD studies)

EDIT: I agree this does make TheiaProk_ONT input parameters differ from the other TheiaProk workflows.... hmmmmm 🤔

EDIT2: I think we should keep as is, where in RASUSA_PHB standalone workflow the genome_size input is a String. And in TheiaProk_ONT_PHB, the genome_size input is kept as an Int (which is then fed into the read_qc_ont subworkflow and passed into Rasusa task)

jrotieno commented 4 months ago

On one hand I am thinking String is versatile to allow both a string and integer input, while on the other I am not sure that we want to encourage people to input 5Kb instead of 5000 even if the tool allows it.

cimendes commented 4 months ago

I guess this settles it! It shall remain as is! :D Approving the PR and merging!