Closed cimendes closed 3 months ago
@cimendes I suggest we change abricate_vibrio_abricate_tsv
to abricate_vibrio_detailed_tsv
for coherence with SRST2's srst2_vibrio_detailed_tsv
Ran TheiaProk_Illumina_PE, ONT and FASTA; all workflows ran as anticipated. Comments:
vibrio
. It would be nice to add a version for this- in case we add to or change the db in the futureInt srst2_min_cov = 80
Int srst2_max_divergence = 20
Int abricate_vibrio_minid = 70
Int abricate_vibrio_mincov = 60
Ran TheiaProk_Illumina_PE, ONT and FASTA; all workflows ran as anticipated. Comments:
- Abricate database is simply set to
vibrio
. It would be nice to add a version for this- in case we add to or change the db in the future- There is some discordance between abricate and srst2 results from Illumina data
- There is a discrepancy coverage and ID thresholds being used between the two vibrio characterization modules (see below). Have abricate thresholds of cov =80 and id =80 been tested (as consistent with srst2)?
Int srst2_min_cov = 80 Int srst2_max_divergence = 20 Int abricate_vibrio_minid = 70 Int abricate_vibrio_mincov = 60
Thank you so much for looking over this PR! Indeed the abricate module has not been tested, as far as I'm aware, with anything other than default values (minid of 70 and mincov of 60) (tagging @jrotieno as he did most of the testing). These values were taken from the abricate task for A. baumanii and I didn't give it much though. By default abricate has them as 80 80. Would it be to laborious for me to set these values as per abricate defaults and then retest?
Hi @emmadoughty, the following changes have been made: abricate_vibrio - minid and mincov set to the default 80 for both
tests done here: Illumina_PE: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/2c9e23ef-2a0b-4bdf-9eca-78d520c94722 For two samples, it appears that changing the two default values have had an impact on the biotype detection.
sample | abricate_vibrio_biotype | abricate_vibrio_biotype_old |
---|---|---|
SRR7062511 | (not detected) | tcpA_ElTor |
SRR7062612 | (not detected) | tcpA_classical |
re-run with old defaults to see if we get the old values: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/2fd82255-536a-4871-93fd-e33149e985b9, and indeed we get the old results
Also these samples had different ompW results sample | abricate_vibrio_ompW | abricate_vibrio_ompW_old |
---|---|---|
SRR7062587 | (not detected) | present |
SRR7637792 | (not detected) | present |
SRR7637793 | (not detected) | present |
SRR7637794 | (not detected) | present |
SRR7637796 | (not detected) | present |
SRR7637798 | (not detected) | present |
SRR7637799 | (not detected) | present |
Different results with toxR
sample | abricate_vibrio_toxR | abricate_vibrio_toxR_old |
---|---|---|
SRR7062587 | (not detected) | present |
SRR7637794 | (not detected) | present |
SRR7062522 | (not detected) | present |
SRR7062523 | (not detected) | present |
SRR7062525 | (not detected) | present |
SRR7062551 | (not detected) | present |
SRR7637797 | (not detected) | present |
Did not expect differences between the previous and current srst2 runs, and there were none
Differences between abricate_vibrio and srst2: sample | abricate_vibrio_ctxA | srst2_vibrio_ctxA |
---|---|---|
SRR7062576 | (not detected) | present |
SRR7062601 | (not detected) | present (low depth/uncertain) |
None of the below had different results when thresholds were changed above sample | abricate_vibrio_ompW | srst2_vibrio_ompW |
---|---|---|
SRR7637797 | present | (not detected) |
SRR7062519 | present | (not detected) |
SRR7062552 | present | (not detected) |
SRR7062592 | present | (not detected) |
SRR7062631 | (not detected) | present |
Note that samples SRR7062522, SRR7062523, SRR7062525, and SRR7062551 below similarly had different results when thresholds were changed above
sample | abricate_vibrio_toxR | srst2_vibrio_toxR |
---|---|---|
SRR7062522 | (not detected) | present (low depth/uncertain) |
SRR7062523 | (not detected) | present |
SRR7062525 | (not detected) | present |
SRR7062551 | (not detected) | present |
SRR7062539 | (not detected) | present |
ONT: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/5e179757-9fe5-44bc-8b1a-5b290a415fb2 No differences observed when thresholds were changed.
Thanks, James. This is extremely helpful!
Note, samples SRR7062511 and SRR7062612 have the same results with both abricate implementations, but different results with SRST2.
Looking at the new results, these are more concordant with the SRST2 results (with the exception of results for toxR detection).
Thresholds for min ID and min coverage may need to be optimized but this will require a gold standard. Gold standard results may be taken from the PHE paper, or partner labs
This PR closes #395
🗑️ This dev branch should NOT be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR adds a new Vibrio-specific abricate task for genomic characterization. It relies on a species-specific database that is packaged in the
us-docker.pkg.dev/general-theiagen/internal/abricate:1.0.1-vibrio-cholera
container. The PR for this container, including the database, is available at https://github.com/StaPH-B/docker-builds/pull/963In short, if
gambit
determines the species as being Vibrio or Vibrio cholerae, it runs theabricate_vibrio
task on the assembly. The results are populated to the sample-level datatable with the prefixabricate_vibrio
prefix. They are also populated to the taxon table if this functionality is activated.Within the
abricate_vibrio
task, the abricate output file is parsed to determine the following::hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes if analysing vibrio data
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: Nothing on pre-existing components of TheiaProk. A new module has been added specific for Vibrio
Databases or database versions changed: Nothing on pre-existing components of TheiaProk. A new module has been added specific for Vibrio
Data processing/commands changed: Nothing on pre-existing components of TheiaProk. A new module has been added specific for Vibrio
File processing changed: Nothing on pre-existing components of TheiaProk. A new module has been added specific for Vibrio
Compute resources changed: Nothing on pre-existing components of TheiaProk. A new module has been added specific for Vibrio
➡️ Inputs
Exposed through merlin_magic:
⬅️ Outputs
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
Illumina dataset: From https://journals.asm.org/doi/full/10.1128/jcm.00831-18 The study undertook characterization of Vibrio cholerae strains isolated between April 2004 and March 2018 and held at the Public Health England culture archive. The publication reports traditional biochemical species identification and serological typing results and genome-derived species identification and serotyping for a subset of the isolates. The data includes samples from different biotypes, serogroups, and V. cholerae and non-cholera Vibrio species. True Positive Rate (TPR) of 0.9-1.0. Terra: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/df0be5c2-234c-420e-a8d0-f680ebd20779
ONT dataset: from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10996759/ All samples confirmed using conventional PCR to be toxigenic V. cholerae and to be serogroup O1 ONT: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/d151f49b-b137-4780-9c5c-d6afea8628e4
Suggested Scenarios for Reviewer to Test
Perhaps additional ONT testing with O139 serogroup and tcpA_Classical biotype may be great. Also a great idea to take some of the assemblies from either or both Illumina and ONT datasets above and run through TheiaProk_FASTA
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)