theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

[TheiaProk] Update amrfinderplus to v3.12.8; DB: v2024-05-02.2; reduce compute resources #514

Closed kapsakcj closed 1 week ago

kapsakcj commented 1 week ago

This PR closes #511

🗑️ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

Workflows impacted

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed: v3.11.20 & DB v2023-09-26.1 ➡️ v3.12.8 & DB v2024-05-02.2

Databases or database versions changed: see above

Data processing/commands changed: double quotes added to one option (does not impact how it runs at all)

File processing changed: none

Compute resources changed: reduced cpus to 2, disk_size to 50, and memory to 8. also allow for preemptible VMs to be used

➡️ Inputs

⬅️ Outputs

N/A

:test_tube: Testing

Test Dataset

I use a diverse dataset ILMN PE data of bacteria that are known to have point mutations or other organism-specific acquired resistance genes. Almost every species that is part of the amrfinder --organism flag for organism-specific results. See the data table amrfinderplus_testing_sample for full list of species & accessions.

I have also gathered data (reads or genome assemblies only) for 3 organisms we do not yet have support for (Enterobacter asburiae, vibrio vulnificus, vibrio parahaemolyticus) just to see how they behave.

Commandline Testing with MiniWDL or Cromwell (optional)

Tested WDL task successfully with miniwdl.

Terra Testing

Suggested Scenarios for Reviewer to Test

Test on as diverse a set of species as possible. Good to test any of the TheiaProk workflows.

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

michellescribner commented 1 week ago

Tested dataset of 88 bacterial isolates representing diverse taxa (set 3 from GAMBIT publication): https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/9838ff2d-d384-412e-bb65-fa9b12af7557