monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 2 forks source link

EBI Gene 2 Phenotype Ingest #209

Closed putmantime closed 2 years ago

putmantime commented 2 years ago

High quality g2p data with potential for new qualifier modeling.

Review the dipper ingest https://github.com/monarch-initiative/dipper/blob/master/dipper/sources/EBIGene2Phen.py (see also supporting files like https://github.com/monarch-initiative/dipper/blob/master/translationtable/ebi_g2p.yaml).

Data format: https://www.ebi.ac.uk/gene2phenotype/README

Data source: https://www.ebi.ac.uk/gene2phenotype/downloads/

Cancer gene-disease pairs and attributes (CancerG2P.csv.gz) export DD gene-disease pairs and attributes (DDG2P.csv.gz) export Eye gene-disease pairs and attributes (EyeG2P.csv.gz) export Skin gene-disease pairs and attributes (SkinG2P.csv.gz) export

RichardBruskiewich commented 2 years ago

G2P data model

To what depth do we model this? Which models and how do they interrelate?

  1. Gene:
    1. Use HGNC id for primary id
    2. Do we care about the MIM id (as an alias) or just ignore it?
  2. GeneToDiseaseAssociation
    1. Which predicate? (gene_associated_with_condition?)
    2. Do we record disease "organ specificity list" (if so, how?) Or is this subsumed in external disease descriptions (e.g. OMIM)?
  3. GeneToPhenotypeAssociation
    1. Predicate: has_phenotype?
    2. Gene to (disease) phenotype entry has one-to-many HP terms.
      1. Do we create a specific association statement for each one or…
      2. Do we aggregate them as qualifiers
      3. Should the HP terms rather be associated with the Disease rather than the Gene(?)
    3. Are there circumstances where a particular subset of (disease-related) phenotype terms are only associated with a specific subset of variants (of an associated gene)?
  4. VariantToDiseaseAssociation
    1. Subject variant identification?
    2. Genetic model?
      1. Do we care about it right now?
      2. Predicate?
      3. Relevant Object ids? (ontology?)
    3. Mutation consequence?
      1. Predicate? (Biolink only has has_molecular_consequence; Dipper model also has has_functional_consequence )
      2. Relevant Object ids? (ontology?)
RichardBruskiewich commented 2 years ago

From Agenda minutes of the Monarch Data call of 14 April 2022:

Chris M: Let’s make sure we have all the canonical g2p associations for human before jumping into this EBI data. Perhaps a topic for Leads meeting moni@tislab.org

RichardBruskiewich commented 2 years ago

Ingest put on hold pending further discussions about the EBI G2P resource by Monarch lead, but progress on code available in issue-209-ebi-gene-2-phenotype-ingest branch.