ncats / translator-workflows

12 stars 6 forks source link

Workflow 2 Module 5 - Variant-Phenotype Associations via PheWAS Analysis #6

Open mbrush opened 6 years ago

mbrush commented 6 years ago

Module Overview

This module aims to apply case-level phenotype and genotype data to link variants to phenotype related to a condition of interest. The reference instantiation for this module will use Fanconi Anemia (FA)-related genes as input, and look for variants in human populations that are associated with FA-related clinical phenotypes..

Related Workflows

  1. Workflow 2 Module 5: In this workflow this module is used to identify additional evidence supporting the possibility that a gene related to FA may be an FA disease modifier. Specifically, we are looking for variants in a gene that are associated with clinical phenotype(s) related to FA, as evidence that the gene may modify FA pathogenesis.

Implementation

For the FA instantiation, we take as input hits from an initial screen to find genes related to FA-genes in some way. The approach proposed below uses PheWAS analysis to find variants that are statistically associated with clinical phenotypes in a patient population. It is not well-specified, nor are we confident in the scientific validity of this approach. But it is provided as a starting point for discussion.

  1. Find patients with variants in the set of initial gene hits
  2. Filter variants using various features to identify 'interesting' candidates (e.g. those with molecular consequences, population frequencies, and predicted functional impacts that suggest potential to be clinically relevant as modifiers. Can use resources such as Gnomad, ExAC, CADD, etc.)
  3. Perform PheWAS on these variants to identify patient phenotypes statistically associated with any of these variants.
  4. Assess whether any phenotypes identified in this way are relevant / interesting from FA perspective - and further prioritize their affected gene(s) as candidate modifiers

Need access to rich set of patient geno-pheno data, preferably in a population enriched for FA-related phenotypes