Open kapsakcj opened 3 months ago
other TODOs:
Waiting on user feedback prior to making more code changes
Plan as of 2024-07-10:
call_stxtyper
. This will allow organisms of any genus/species to be screen for stx genes since they can occur in more genera/species other than E. coli & Shigella
Also - adjust conditional in merlin_magic code to so that user can "opt-in" to running stxtyper, regardless of the taxa (i.e. gambit_predicted_taxon).
That way stxtyper is run automatically on all E. coli and Shigella and user has the ability to run it on other taxa.
Successfully ran stxtyper on 1 A. baumm and 1 Burkolderia cepacia genome using the call_stxtyper
optional input Boolean. https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-Aug2024/job_history/df49fd54-2614-4c0d-a21b-914ca40da962
Awaiting feedback from our PH partner, and will update remaining TheiaProk workflows & CI after making any further adjustments/changes
Please keep this PR as draft for now as stxtyper is still undergoing validation/peer review/publication
Using this draft PR for tracking stxtyper development as well as amrfinderplus (that runs stxtyper under the hood).
Our partners are actively using this branch for testing purposes.
I'll try to keep this branch up-to-date with
main
to incorporate other changes & resolve merge conflicts if they arise.This PR closes #443
🗑️ This dev branch should NOT be deleted after merging to main.
2024-09-23 update: I expect to hear feedback from our partner soon, but updating PR message now with tests and info
:brain: Aim, Context and Functionality
This PR adds Stxtyper to the TheiaProk workflows. Stxtyper is used to detect and type shiga toxin genes in bacterial genome assemblies. It also attempts to detect novel shiga toxin subtypes in cases where the detected sequences diverge from the reference sequences.
These genes are usually found in E. coli (STEC), but can also be found in Shigella species as well as some other genera more rarely, like Klebsiella. It is developed by NCBI in collaboration with a number of different groups including CDC, FDA, SSI, and others. A publication to fully describe the tool and it's validation is in the works but a software release has been made so the community may test the software further and begin using the tool.
This tool queries genome assemblies for 2 genes or subunits involved in shiga toxin production, stxA and stxB. The A subunit is longer than the B subunit. Stxtyper attempts to detect these, compare them to a database of known sequences, and type them based on amino acid composition. The typing algorithm will be described in the publication when it is published.
More info & source code found here: https://github.com/ncbi/stxtyper
To learn more about shiga toxin subtypes and the description of the latest subtypes, Stx2n, Stx2j, Stx2m, and Stx2o, see this publication (shamless plug): https://www.mdpi.com/2076-2607/11/10/2561
Eventually this tool will be incorporated into AMRFinderPlus and will run behind-the-scenes when the user provides the
amrfinder --organism Escherichia
option, but we wanted the functionality now and the ability to run separate from AMRFinderPlus.:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes/No
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes/No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed:
Databases or database versions changed:
Data processing/commands changed:
File processing changed:
Compute resources changed:
➡️ Inputs
⬅️ Outputs
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)