theiagen / public_health_viral_genomics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of viral pathogens of concern, especially SARS-CoV-2
https://public-health-viral-genomics-theiagen.readthedocs.io/
GNU Affero General Public License v3.0
40 stars 17 forks source link

Add detection for aa substitutions associated with tamiflu-resistance #214

Closed cimendes closed 1 year ago

cimendes commented 1 year ago

Motivation

This PR adds a new column "tamiflu_resistance_aa_subs" containing nextclade-detected substitutions that have been described in the literature to confer resistance to tamiflu.

The current list of substitutions is as follows:

"NA:V95A","NA:I97V","NA:E99A","NA:H101L","NA:G108E",
"NA:Q116L","NA:V116A","NA:E119D","NA:E119G","NA:E119I","NA:E119V","NA:R136K",
"NA:T146K","NA:T146P","NA:D151E","NA:N169S","NA:D179N","NA:D197N","NA:D198E",
"NA:D198G","NA:D198N","NA:A200T","NA:I203M","NA:I203R","NA:I203V","NA:I221T",
"NA:I222R","NA:I222V","NA:I223R","NA:I223V","NA:S227N","NA:S247N","NA:H255Y",
"NA:E258Q","NA:H274N","NA:H274Y","NA:H275Y","NA:N275S","NA:H277Y","NA:R292K",
"NA:N294S","NA:S334N","NA:R371K","NA:D432G","NA:H439P","NA:H439R"

This is currently hard-coded into the nextclade_output_parser_one_sample task in the task_taxonID.wdl file. The current behaviour of the workflow can be summarized in the following points:

  1. when the "flu" organism is set, the read data is assembled by IRMA which returns both the HA and NA sequence fragments.
  2. abricate returns the appropriate nextclade_ref, nextclade_name and nextclade_ds_tag for both HA and NA depending on the subtype detected
  3. for flu, nexclade runs twice, once for the HA segment and the second time for the NA segment. When running for the NA segment, it compares the detected list of aa substitutions with the list of tamiflu-resistance-associated substitutions, returning the intercept of both lists

Testing

Locally

miniwdl run ~/Git/public_health_viral_genomics/workflows/wf_theiacov_illumina_pe.wdl samplename= BigTest read1_raw= ~/Test/tamiflu_resistance/SRR18273525_1.fastq.gz read2_raw= ~/Test/tamiflu_resistance/SRR18273525_2.fastq.gz organism="flu"

Terra

Test 1 - Random SRA accessions Test 2 - Samples 01 to 04 of theiacov flu demo dataset