Closed kapsakcj closed 7 months ago
TODO: need to update theiaprok ILMN SE, ONT, and FASTA workflows to get mlst scheme input from merlin_magic subworkflow.
Testing that this works for ILMN PE in Terra now
Tests launched & comments regarding them:
mlst_scheme
: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/464f6fef-f9f0-4a36-929b-265cf8c2567f
vcholerae_2
scheme for those designated as O1/O139 from SRST2 task and used vcholerae
for all other Vcholerae samplesmlst_scheme
as input: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/17e64f6a-97b5-441f-9a0e-57d4ac764066
"''"
as input for mlst_scheme
to trigger the mlst
scheme auto-selection featuremlst_scheme
as input (used "ecoli_achtman_4"
): https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/942e9f52-4eaf-4446-9b0c-34f76fae8c25
mlst_scheme
provided inputTODO:
Functional test results:
mlst_scheme
variable, left blank"''"
as input for mlst_scheme
to trigger the mlst
scheme auto-selection featuremlst_scheme
variable, left blank"''"
as input for mlst_scheme
to trigger the mlst
scheme auto-selection featuremlst_scheme
variable, left blank"''"
as input for mlst_scheme
to trigger the mlst
scheme auto-selection featureOne last test to show that user-defined input is priority in the select_first statement in merlin_magic subworkflow.
I used "senterica"
as the input mlst_scheme variable for a few E. coli and the mlst task used senterica
as the scheme:
I have looked deeper into the literature given Curtis' observation that no ST was assigned to the O1 and O139 samples with the vcholerae_2
scheme.
vcholerae_2
)published in 2020 as the O1 and O139 scheme, it doesn't seem to be used widely (only 5 citations to the paper) and only 400 PubMLST submissions to the scheme have ever been made, all on one date, presumably when the scheme was first published. The paper publishing the scheme. described that "there is no proper MLST database for typing V. cholerae O1 and O139 strains, except for non-O1, non-O139 strains (Octavia et al., 2013)." hence them developing their scheme.vcholerae
) published in 2013 is listed as the non-O1, non-O139 on PubMLST and the title of the paper publishing this scheme mentions only these serotypes, non-O1 and non-O139 . On reading the paper, the scheme was developed based on housekeeping genes identified in previous work looking at more diverse V. cholerae (including O1). Octavia et al. had published the MLST scheme in a paper that was also describing non-O1 and non-O139 isolates, but I can't see any references to this MLST scheme only being for non-O1/non-O139 other than on PubMLST and in the Kanampalliwar & Singh paper. The Octavia et al. scheme has been used widely, with 85 citations on Google scholar. These seem to mostly be people using the MLST scheme, and they use it for O1 and O139. It also has 1800 submissions to PubMLST. Though PubMLST lists the vcholerae_2
scheme as the O1 and O139 scheme, it may be more useful for the public health community to use the vcholerae
scheme as they are more likely to identify a named ST for their V. cholerae, including O1 and O139 isolates, and the use of this scheme doesn't seem to be wrong- in fact, there seems to be a consensus to preferentially using this.
Thank you @emmadoughty for the detective work π΅π»ββοΈ
I agree with your assessment that the community is using vcholerae
scheme for all isolates, regardless of serotype. I think we should close off this PR and not incorporate these changes into the main codebase to keep in line with the community standards and expectations.
This PR closes #58
ποΈ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
Vibrio cholerae has two MLST schemes, one for O1 or O139 serogroups, and one for non-O1/non-O139 serogroups. The latter is always being used regardless of serogroup, because
mlst
's software auto scheme detection feature seems to always choose the non-O1 & non-O139 scheme.mlst
also has a few default schemes that are excluded by default andvcholerae_2
scheme is one of them, so perhaps that is whyvcholerae
is default.The
mlst
command-line tool calls these schemes as such:vcholerae
vcholerae_2
By default, the
mlst
auto-scheme detection chooses the non-O1 & O139 scheme so this is a way of automatically forcingmlst
to usevcholerae_2
for samples detected as O1 or O139 via the SRST2 task.More info on the 2 mlst schemes can be found here: https://pubmlst.org/organisms/vibrio-cholerae
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users donβt change any workflow inputs relative to the last version : No, but the exception is if they are analyzing V. cholerae samples that are O1 or O139 serogroups. The mlst results will change if analyzed w/ a previous version of TheiaProk wf
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
π Data Processing
Docker/software or software versions changed: No
Databases or database versions changed: No
Data processing/commands changed: No
File processing changed: No
Compute resources changed: No
β‘οΈ Inputs
ts_mlst.scheme
has been exposed as an optional input for the TheiaProk_Illumina_PE workflowβ¬ οΈ Outputs
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
π― Reviewer Checklist
ποΈ Associated Documentation (to be completed by Theiagen developer)