ncats / translator-workflows

12 stars 6 forks source link

Workflow 2 Module 2 - Disease Feature Associated Genes #4

Open mbrush opened 5 years ago

mbrush commented 5 years ago

Module Overview

This module aims to find genes predicted to impact a known feature or mechanism of the disease. It requires condition-specific knowledge and access to relevant datasets. The reference instantiation for this module will use Fanconi Anemia (FA)-related genes as input.

Related Workflows

  1. Workflow 2 Module 2: Here this module is used to identify genes to be further evaluated as potential modifiers of a genetic condition in the downstream steps of the workflow.

Implementation

For the FA instantiation, we can use knowledge that underlying mechanism is inability to repair ‘DNA replication blocks’, and that the following types of genetic lesions can lead to such blocks:

We want to find genes that may impact levels of agents that contribute to these types of DNA damage as variants in these genes may modify FA through their effect on DNA damage and repair. The steps of one proposed approach are listed below.

  1. What (endogenous) agents may promote the types of DNA damage that cause ‘DNA replication blocks’ (see list above).
  2. Find genes that function to reduce levels of these agents (by blocking their production, or facilitating their metabolism, processing, or clearance) . . . as variants in these genes may lead to increase in DNA replication blocks
  3. Find genes that function to increase levels of these agents (by promoting their production, or blocking their metabolism, processing, or clearance) . . . as variants in these genes may lead to decreases in DNA replication blocks. These may be secondarily informative (e.g. if we know of genes/agents that negatively regulate their activity)

Use this ticket to propose/discuss ideas for implementing this module.

mbrush commented 5 years ago

A challenge for this module may be that FA disease features indicated above (i.e. the types of DNA damage that contribute to the condition) are not annotated to biological entities in structured databases. AFAIK there are no ontologies covering molecular pathologies like 'DNA replication blocks' and 'inter-strand-cross links' at the level of detail required here, and no databases that curate associations between genes, chemicals and these types of DNA damage from the literature or public datasets.

To answer this module, we may have to rely on NLP of free text pubs, or creative inference strategies using structured data we do have. We welcome any ideas about approaches, tools, or data sources that could be used here.

mbrush commented 5 years ago

@tylerperyea @MarkDWilliams. Noted that Purple was the only team to enter a 2 or higher for the first question in this critical Module ("What endogenous agents/chemicals may promote DNA replication blocks’?"). As noted above, this is the real challenging step in this module. What approach/resources were you thinking of to answering this question? (Maybe can discuss briefly on next I&Q call?).

mbrush commented 5 years ago

@tylerperyea @MarkDWilliams if you have a chance can you give a general overview of how Purple proposes to answer this question (specifically, what endogenous agents/chemicals may cause the kids of DNA damage that lead to replication fork stalls’)?

MarkDWilliams commented 5 years ago

So, in looking at some of the options for doing the more general form of this question (what endogenous agents/chemicals may cause the kids of DNA damage that lead to the given cellular phenotype), CTD has annotations for a given chemical listing the pathways and phenotypes in which this chemical is involved. While this is unlikely to have complete coverage of the cellular phenotypes we are interested in, it may be a decent place to start. We've also discussed trying to infer this information from MeSH or SemMedDB, but I think the error rate with this method would be high enough to negate the value it might add. Other than CTD, are there resources that have this sort of annotation?