theiagen / public_health_bacterial_genomics

GNU Affero General Public License v3.0
27 stars 14 forks source link

Incorporate vibrio characterisation with srst2 into TheiaProk workflows #216

Closed cimendes closed 1 year ago

cimendes commented 1 year ago

Motivation

An Abricate database of target genes for Vibrio characterization was constructed, with the corresponding PR being open at https://github.com/StaPH-B/docker-builds/pull/618.

This docker image includes a Vibrio cholerae-specific database of gene targets (traditionally used in PCR methods) for detecting O1 & O139 serotypes, toxin-production markers, and Biotype markers within the O1 serogroup ("El Tor" or "Classical" biotypes). These sequences were shared via personal communication with Dr. Christine Lee, of the National Listeria, Yersinia, Vibrio and Enterobacterales Reference Laboratory within the Enteric Diseases Laboratory Branch at CDC.

The genes included (and their purpose) included in the database are as follows:

Until further testing, the current container included in the workflow is quay.io/kapsakcj/srst2:0.2.0-vcholerae

A new task task_srst2_vibrio.wdl was included that runs srst2 with the custom vibrio database, and the resulting hits on the gene sequences are reported. The task was included in merlin_magic_workflow.wdl for any sample identified as belonging to the genus vibrio. This has been implemented in both ´TheiaProk_Illumina_PEand ´TheiaProk_Illumina_SE.

The following outputs are retrieved:

  File srst2_tsv = "~{samplename}.tsv"
  String srst2_version = read_string("VERSION")
  String srst2_vibrio_ctxA = read_string("ctxA")
  String srst2_vibrio_ompW = read_string("ompW")
  String srst2_vibrio_tcpA_ElTor = read_string("tcpA_ElTor")
  String srst2_vibrio_toxR = read_string("toxR")
  String srst2_vibrio_wbeN_O1 = read_string("wbeN_O1")

Testing

The workflow has been tested in 152 V. cholerae sequence runs on Terra using Theiaprok_Illumina_PE

Theiaprok_Illumina_SE has been tested locally with sample SRR7062492 as importing the workflow with the correct branch was not possible in Terra (@kapsakcj have you seen this issue before?)

kapsakcj commented 1 year ago

FYI let's wait to fix the CI until we've made all the changes we discussed to the SRST2 task & workflows

kapsakcj commented 1 year ago

Emma's most recent test: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Doughty_Sandbox/job_history/ff536588-c8bb-4bbf-aab5-51158024114c

kevinlibuit commented 1 year ago

File changes look good! I ran things in a sandbox with the updated defaults as well. Functionally everything is looking great. Well done, all!